Searched for: collection%253Air
(1 - 20 of 55)

Pages

document
Patel, T.B. (author), Scharenborg, O.E. (author)
Children’s Speech Recognition (CSR) is a challenging task due to the high variability in children’s speech patterns and limited amount of available annotated children’s speech data. We aim to improve CSR in the often-occurring scenario that no children’s speech data is available for training the Automatic Speech Recognition (ASR) systems....
journal article 2024
document
Herygers, Aaricia (author), Verkhodanova, Vass (author), Coler, Matt (author), Scharenborg, O.E. (author), Georges, Munir (author)
Research has shown that automatic speech recognition (ASR) systems exhibit biases against different speaker groups, e.g., based on age or gender. This paper presents an investigation into bias in recent Flemish ASR. Seeing as Belgian Dutch, which is also known as Flemish, is often not included in Dutch ASR systems, a state-of-the-art ASR system...
conference paper 2023
document
Brodbeck, Christian (author), Kandylaki, Katerina Danae (author), Scharenborg, O.E. (author)
Learning to process speech in a foreign language involves learning new representations for mapping the auditory signal to linguistic structure. Behavioral experiments suggest that even listeners that are highly proficient in a non-native language experience interference from representations of their native language. However, much of the...
journal article 2023
document
Zhang, Y. (author), Herygers, Aaricia (author), Patel, T.B. (author), Yue, Z. (author), Scharenborg, O.E. (author)
Automatic speech recognition (ASR) should serve every speaker, not only the majority “standard” speakers of a language. In order to build inclusive ASR, mitigating the bias against speaker groups who speak in a “non-standard” or “diverse” way is crucial. We aim to mitigate the bias against non-native-accented Flemish in a Flemish ASR system....
conference paper 2023
document
Lin, Zhaofeng (author), Patel, T.B. (author), Scharenborg, O.E. (author)
Whispering is a distinct form of speech known for its soft, breathy, and hushed characteristics, often used for private communication. The acoustic characteristics of whispered speech differ substantially from normally phonated speech and the scarcity of adequate training data leads to low automatic speech recognition (ASR) performance. To...
conference paper 2023
document
Feng, S. (author), Halpern, B.M. (author), Kudina, O. (author), Scharenborg, O.E. (author)
Practice and recent evidence show that state-of-the-art (SotA) automatic speech recognition (ASR) systems do not perform equally well for all speaker groups. Many factors can cause this bias against different speaker groups. This paper, for the first time, systematically quantifies and finds speech recognition bias against gender, age, regional...
journal article 2023
document
Wang, X. (author), Xie, Qicong (author), Xie, Lei (author), Zhu, Jihua (author), Scharenborg, O.E. (author)
Automatically generating videos in which synthesized speech is synchronized with lip movements in a talking head has great potential in many human-computer interaction scenarios. In this paper, we present an automatic method to generate synchronized speech and talking-head videos on the basis of text and a single face image of an arbitrary...
journal article 2023
document
Halpern, B.M. (author), Feng, S. (author), van Son, Rob (author), van den Brekel, Michiel (author), Scharenborg, O.E. (author)
In this paper, we build and compare multiple speech systems for the automatic evaluation of the severity of a speech impairment due to oral cancer, based on spontaneous speech. To be able to build and evaluate such systems, we collected a new spontaneous oral cancer speech corpus from YouTube consisting of 124 utterances rated by 100 non...
journal article 2023
document
Dekker, Bo (author), Schouten, A.C. (author), Scharenborg, O.E. (author)
Silent speech interfaces could enable people who lost the ability to use their voice or gestures to communicate with the external world, e.g., through decoding the person’s brain signals when imagining speech. Only a few and small databases exist that allow for the development and training of brain computer interfaces (BCIs) that can decode...
conference paper 2023
document
Wang, Zhe (author), Wu, Shilong (author), Chen, Hang (author), He, Mao-Kui (author), Du, Jun (author), Lee, Chin-Hui (author), Chen, Jingdong (author), Watanabe, Shinji (author), Siniscalchi, Sabato Marco (author), Scharenborg, O.E. (author), Liu, Diyuan (author)
The Multi-modal Information based Speech Processing (MISP) challenge aims to extend the application of signal processing technology in specific scenarios by promoting the research into wake-up words, speaker diarization, speech recognition, and other technologies. The MISP2022 challenge has two tracks: 1) audio-visual speaker diarization (AVSD),...
conference paper 2023
document
Wilschut, Thomas (author), Sense, Florian (author), Scharenborg, O.E. (author), van Rijn, Hedderik (author)
Cognitive models of memory retrieval aim to describe human learning and forgetting over time. Such models have been successfully applied in digital systems that aid in memorizing information by adapting to the needs of individual learners. The memory models used in these systems typically measure the accuracy and latency of typed retrieval...
conference paper 2023
document
Zhang, Y. (author), Zhang, Yixuan (author), Halpern, B.M. (author), Patel, T.B. (author), Scharenborg, O.E. (author)
Automatic speech recognition (ASR) systems have seen substantial improvements in the past decade; however, not for all speaker groups. Recent research shows that bias exists against different types of speech, including non-native accents, in state-of-the-art (SOTA) ASR systems. To attain inclusive speech recognition, i.e., ASR for everyone...
journal article 2022
document
Chen, Hang (author), Zhou, Hengshun (author), Du, Jun (author), Lee, Chin-Hui (author), Chen, Jingdong (author), Watanabe, Shinji (author), Siniscalchi, Sabato Marco (author), Scharenborg, O.E. (author), Liu, Di-Yuan (author)
In this paper we discuss the rational of the Multi-model Information based Speech Processing (MISP) Challenge, and provide a detailed description of the data recorded, the two evaluation tasks and the corresponding baselines, followed by a summary of submitted systems and evaluation results. The MISP Challenge aims at tack-ling speech processing...
conference paper 2022
document
Huang, Wen-Chin (author), Halpern, B.M. (author), Violeta, Lester Phillip (author), Scharenborg, O.E. (author), Toda, Tomoki (author)
We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity. Such a framework is essential for (1) clinical decision making processes and alleviation of patient stress, (2) data augmentation for dysarthric speech recognition. This is an especially challenging task since the...
conference paper 2022
document
Cooke, Martin (author), Scharenborg, O.E. (author), Meyer, Bernd T. (author)
When confronted with unfamiliar or novel forms of speech, listeners' word recognition performance is known to improve with exposure, but data are lacking on the fine-grained time course of adaptation. The current study aims to fill this gap by investigating the time course of adaptation to several different types of distorted speech. Keyword...
journal article 2022
document
Prananta, Luke (author), Halpern, B.M. (author), Feng, S. (author), Scharenborg, O.E. (author)
In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition. We compare key components of existing methods as part of a rigorous ablation study to find the most effective solution to...
journal article 2022
document
Zhou, Hengshun (author), Du, Jun (author), Zou, Gongzhen (author), Nian, Zhaoxu (author), Lee, Chin Hui (author), Siniscalchi, Sabato Marco (author), Watanabe, Shinji (author), Scharenborg, O.E. (author), Chen, Jingdong (author)
In this paper, we describe and release publicly the audio-visual wake word spotting (WWS) database in the MISP2021 Challenge, which covers a range of scenarios of audio and video data collected by near-, mid-, and far-field microphone arrays, and cameras, to create a shared and publicly available database for WWS. The database and the code ...
journal article 2022
document
Chen, Hang (author), Du, Jun (author), Dai, Yusheng (author), Lee, Chin Hui (author), Siniscalchi, Sabato Marco (author), Watanabe, Shinji (author), Scharenborg, O.E. (author), Chen, Jingdong (author), Yin, Bao Cai (author), Pan, Jia (author)
In this paper, we present the updated Audio-Visual Speech Recognition (AVSR) corpus of MISP2021 challenge, a large-scale audio-visual Chinese conversational corpus consisting of 141h audio and video data collected by far/middle/near microphones and far/middle cameras in 34 real-home TV rooms. To our best knowledge, our corpus is the first...
journal article 2022
document
Patel, T.B. (author), Scharenborg, O.E. (author)
In the diverse and multilingual land of India, Hindi is spoken as a first language by a majority of its population. Efforts are made to obtain data in terms of audio, transcriptions, dictionary, etc. to develop speech-technology applications in Hindi. Similarly, the Gram-Vaani ASR Challenge 2022 provides spontaneous telephone speech, with...
journal article 2022
document
Hintz, Florian (author), Voeten, Cesko C. (author), Scharenborg, O.E. (author)
Listeners frequently recognize spoken words in the presence of background noise. Previous research has shown that noise reduces phoneme intelligibility and hampers spoken-word recognition – especially for non-native listeners. In the present study, we investigated how noise influences lexical competition in both the non-native and the native...
journal article 2022
Searched for: collection%253Air
(1 - 20 of 55)

Pages