- document
-
Brodbeck, Christian (author), Kandylaki, Katerina Danae (author), Scharenborg, O.E. (author)Learning to process speech in a foreign language involves learning new representations for mapping the auditory signal to linguistic structure. Behavioral experiments suggest that even listeners that are highly proficient in a non-native language experience interference from representations of their native language. However, much of the evidence...journal article 2023
- document
-
Wang, Zhe (author), Wu, Shilong (author), Chen, Hang (author), He, Mao-Kui (author), Du, Jun (author), Lee, Chin-Hui (author), Chen, Jingdong (author), Watanabe, Shinji (author), Siniscalchi, Sabato Marco (author), Scharenborg, O.E. (author), Liu, Diyuan (author)The Multi-modal Information based Speech Processing (MISP) challenge aims to extend the application of signal processing technology in specific scenarios by promoting the research into wake-up words, speaker diarization, speech recognition, and other technologies. The MISP2022 challenge has two tracks: 1) audio-visual speaker diarization (AVSD),...conference paper 2023
- document
-
Dekker, Bo (author), Schouten, A.C. (author), Scharenborg, O.E. (author)Silent speech interfaces could enable people who lost the ability to use their voice or gestures to communicate with the external world, e.g., through decoding the person’s brain signals when imagining speech. Only a few and small databases exist that allow for the development and training of brain computer interfaces (BCIs) that can decode...conference paper 2023
- document
-
Wilschut, Thomas (author), Sense, Florian (author), Scharenborg, O.E. (author), van Rijn, Hedderik (author)Cognitive models of memory retrieval aim to describe human learning and forgetting over time. Such models have been successfully applied in digital systems that aid in memorizing information by adapting to the needs of individual learners. The memory models used in these systems typically measure the accuracy and latency of typed retrieval...conference paper 2023
- document
-
Wang, X. (author), Xie, Qicong (author), Xie, Lei (author), Zhu, Jihua (author), Scharenborg, O.E. (author)Automatically generating videos in which synthesized speech is synchronized with lip movements in a talking head has great potential in many human-computer interaction scenarios. In this paper, we present an automatic method to generate synchronized speech and talking-head videos on the basis of text and a single face image of an arbitrary...journal article 2023
- document
-
Halpern, B.M. (author), Feng, S. (author), van Son, Rob (author), van den Brekel, Michiel (author), Scharenborg, O.E. (author)In this paper, we build and compare multiple speech systems for the automatic evaluation of the severity of a speech impairment due to oral cancer, based on spontaneous speech. To be able to build and evaluate such systems, we collected a new spontaneous oral cancer speech corpus from YouTube consisting of 124 utterances rated by 100 non...journal article 2023
- document
-
Feng, S. (author), Halpern, B.M. (author), Kudina, O. (author), Scharenborg, O.E. (author)Practice and recent evidence show that state-of-the-art (SotA) automatic speech recognition (ASR) systems do not perform equally well for all speaker groups. Many factors can cause this bias against different speaker groups. This paper, for the first time, systematically quantifies and finds speech recognition bias against gender, age, regional...journal article 2023
- document
-
Karaminis, Themis (author), Hintz, Florian (author), Scharenborg, O.E. (author)Oral communication often takes place in noisy environments, which challenge spoken-word recognition. Previous research has suggested that the presence of background noise extends the number of candidate words competing with the target word for recognition and that this extension affects the time course and accuracy of spoken-word recognition....journal article 2022
- document
-
Halpern, B.M. (author), Feng, S. (author), van Son, Rob (author), van den Brekel, Michiel (author), Scharenborg, O.E. (author)In this paper, we introduce a new corpus of oral cancer speech and present our study on the automatic recognition and analysis of oral cancer speech. A two-hour English oral cancer speech dataset is collected from YouTube. Formulated as a low-resource oral cancer ASR task, we investigate three acoustic modelling approaches that previously...journal article 2022
- document
-
Cooke, Martin (author), Scharenborg, O.E. (author), Meyer, Bernd T. (author)When confronted with unfamiliar or novel forms of speech, listeners' word recognition performance is known to improve with exposure, but data are lacking on the fine-grained time course of adaptation. The current study aims to fill this gap by investigating the time course of adaptation to several different types of distorted speech. Keyword...journal article 2022
- document
-
Merkx, D.G.M. (author), Scholten, Sebastiaan (author), Frank, Stefan L. (author), Ernestus, Mirjam (author), Scharenborg, O.E. (author)Many computational models of speech recognition assume that the set of target words is already given. This implies that these models learn to recognise speech in a biologically unrealistic manner, i.e. with prior lexical knowledge and explicit supervision. In contrast, visually grounded speech models learn to recognise speech without prior...journal article 2022
- document
-
Zhou, Hengshun (author), Du, Jun (author), Zou, Gongzhen (author), Nian, Zhaoxu (author), Lee, Chin Hui (author), Siniscalchi, Sabato Marco (author), Watanabe, Shinji (author), Scharenborg, O.E. (author), Chen, Jingdong (author)In this paper, we describe and release publicly the audio-visual wake word spotting (WWS) database in the MISP2021 Challenge, which covers a range of scenarios of audio and video data collected by near-, mid-, and far-field microphone arrays, and cameras, to create a shared and publicly available database for WWS. The database and the code ...journal article 2022
- document
-
Chen, Hang (author), Du, Jun (author), Dai, Yusheng (author), Lee, Chin Hui (author), Siniscalchi, Sabato Marco (author), Watanabe, Shinji (author), Scharenborg, O.E. (author), Chen, Jingdong (author), Yin, Bao Cai (author), Pan, Jia (author)In this paper, we present the updated Audio-Visual Speech Recognition (AVSR) corpus of MISP2021 challenge, a large-scale audio-visual Chinese conversational corpus consisting of 141h audio and video data collected by far/middle/near microphones and far/middle cameras in 34 real-home TV rooms. To our best knowledge, our corpus is the first...journal article 2022
- document
-
Patel, T.B. (author), Scharenborg, O.E. (author)In the diverse and multilingual land of India, Hindi is spoken as a first language by a majority of its population. Efforts are made to obtain data in terms of audio, transcriptions, dictionary, etc. to develop speech-technology applications in Hindi. Similarly, the Gram-Vaani ASR Challenge 2022 provides spontaneous telephone speech, with...journal article 2022
- document
-
Strauß, Antje (author), Wu, Tongyu (author), McQueen, James M. (author), Scharenborg, O.E. (author), Hintz, Florian (author)Successful spoken-word recognition relies on interplay between lexical and sublexical processing. Previous research demonstrated that listeners readily shift between more lexically-biased and more sublexically-biased modes of processing in response to the situational context in which language comprehension takes place. Recognizing words in...journal article 2022
- document
-
Zhan, Juhong (author), Jiang, Yue (author), Cieri, Christopher (author), Liberman, Mark (author), Yuan, Jiahong (author), Chen, Yiya (author), Scharenborg, O.E. (author)This paper describes our use of mixed incentives and the citizen science portal LanguageARC to prepare, collect and quality control a large corpus of object namings for the purpose of providing speech data to document the under-represented Guanzhong dialect of Chinese spoken in the Shaanxi province in the environs of Xi’an.conference paper 2022
- document
-
Huang, Wen-Chin (author), Halpern, B.M. (author), Violeta, Lester Phillip (author), Scharenborg, O.E. (author), Toda, Tomoki (author)We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity. Such a framework is essential for (1) clinical decision making processes and alleviation of patient stress, (2) data augmentation for dysarthric speech recognition. This is an especially challenging task since the...conference paper 2022
- document
-
Żelasko, Piotr (author), Feng, S. (author), Moro Velázquez, Laureano (author), Abavisani, Ali (author), Bhati, Saurabhchand (author), Scharenborg, O.E. (author), Hasegawa-Johnson, Mark (author), Dehak, Najim (author)The high cost of data acquisition makes Automatic Speech Recognition (ASR) model training problematic for most existing languages, including languages that do not even have a written script, or for which the phone inventories remain unknown. Past works explored multilingual training, transfer learning, as well as zero-shot learning in order...journal article 2022
- document
-
Prananta, Luke (author), Halpern, B.M. (author), Feng, S. (author), Scharenborg, O.E. (author)In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition. We compare key components of existing methods as part of a rigorous ablation study to find the most effective solution to...journal article 2022
- document
-
Zhang, Y. (author), Zhang, Yixuan (author), Halpern, B.M. (author), Patel, T.B. (author), Scharenborg, O.E. (author)Automatic speech recognition (ASR) systems have seen substantial improvements in the past decade; however, not for all speaker groups. Recent research shows that bias exists against different types of speech, including non-native accents, in state-of-the-art (SOTA) ASR systems. To attain inclusive speech recognition, i.e., ASR for everyone...journal article 2022