Search results | TU Delft Repositories

Searched for: subject%253A%2522process%2522

(1 - 8 of 8)

document: Improving Whispered Speech Recognition Performance Using Pseudo-Whispered Based Data Augmentation
Lin, Zhaofeng (author), Patel, T.B. (author), Scharenborg, O.E. (author)
Whispering is a distinct form of speech known for its soft, breathy, and hushed characteristics, often used for private communication. The acoustic characteristics of whispered speech differ substantially from normally phonated speech and the scarcity of adequate training data leads to low automatic speech recognition (ASR) performance. To...
conference paper 2023

document: Recognizing non-native spoken words in background noise increases interference from the native language
Hintz, Florian (author), Voeten, Cesko C. (author), Scharenborg, O.E. (author)
Listeners frequently recognize spoken words in the presence of background noise. Previous research has shown that noise reduces phoneme intelligibility and hampers spoken-word recognition – especially for non-native listeners. In the present study, we investigated how noise influences lexical competition in both the non-native and the native...
journal article 2022

document: Generating Images from Spoken Descriptions
Wang, X. (author), Qiao, T. (author), Zhu, Jihua (author), Hanjalic, A. (author), Scharenborg, O.E. (author)
Text-based technologies, such as text translation from one language to another, and image captioning, are gaining popularity. However, approximately half of the world's languages are estimated to be lacking a commonly used written form. Consequently, these languages cannot benefit from text-based technologies. This paper presents 1) a new...
journal article 2021

document: Synthesizing Spoken Descriptions of Images
Wang, X. (author), van der Hout, Justin (author), Zhu, Jihua (author), Hasegawa-Johnson, Mark (author), Scharenborg, O.E. (author)
Image captioning technology has great potential in many scenarios. However, current text-based image captioning methods cannot be applied to approximately half of the world's languages due to these languages’ lack of a written form. To solve this problem, recently the image-to-speech task was proposed, which generates spoken descriptions of...
journal article 2021

document: Speech technology for unwritten languages
Scharenborg, O.E. (author), Besacier, Laurent (author), Black, Alan W. (author), Hasegawa-Johnson, Mark (author), Metze, Florian (author), Neubig, Graham (author), Stueker, Sebastian (author), Godard, Pierre (author), Mueller, M (author)
Speech technology plays an important role in our everyday life. Among others, speech is used for human-computer interaction, for instance for information retrieval and on-line shopping. In the case of an unwritten language, however, speech technology is unfortunately difficult to create, because it cannot be created by the standard...
journal article 2020

document: The representation of speech and its processing in the human brain and deep neural networks
Scharenborg, O.E. (author)
For most languages in the world and for speech that deviates from the standard pronunciation, not enough (annotated) speech data is available to train an automatic speech recognition (ASR) system. Moreover, human intervention is needed to adapt an ASR system to a new language or type of speech. Human listeners, on the other hand, are able to...
conference paper 2019

document: Why listening in background noise is harder in a non-native language than in a native language: A review
Scharenborg, O.E. (author), van Os, Marjolein (author)
There is ample evidence that recognising words in a non-native language is more difficult than in a native language, even for those with a high proficiency in the non-native language involved, and particularly in the presence of background noise. Why is this the case? To answer this question, this paper provides a systematic review of the...
review 2019

document: The neural correlates underlying lexically-guided perceptual learning
Scharenborg, O.E. (author), Koemans, Jiska (author), Smith, Cybelle (author), Hasegawa-Johnson, Mark (author), Federmeier, Kara D. (author)
There is ample evidence showing that listeners are able to quickly adapt their phoneme classes to ambiguous sounds using a process called lexically-guided perceptual learning. This paper presents the first attempt to examine the neural correlates underlying this process. Specifically, we compared the brain’s responses to ambiguous [f/s] sounds...
conference paper 2019

Searched for: subject%253A%2522process%2522

(1 - 8 of 8)