- document
-
Ji, Hang (author)In this thesis, we analyzed and compared speech representations extracted from different frozen self-supervised learning (SSL) speech pre-trained models on their ability to capture articulatory feature (AF) information and their subsequent prediction of phone recognition performance in within-language and cross-language scenarios. Specifically,...master thesis 2022
- document
-
Żelasko, Piotr (author), Feng, S. (author), Moro Velázquez, Laureano (author), Abavisani, Ali (author), Bhati, Saurabhchand (author), Scharenborg, O.E. (author), Hasegawa-Johnson, Mark (author), Dehak, Najim (author)The high cost of data acquisition makes Automatic Speech Recognition (ASR) model training problematic for most existing languages, including languages that do not even have a written script, or for which the phone inventories remain unknown. Past works explored multilingual training, transfer learning, as well as zero-shot learning in order...journal article 2022
- document
-
Wang, X. (author), Tian, Tian (author), Zhu, Jihua (author), Scharenborg, O.E. (author)In the case of unwritten languages, acoustic models cannot be trained in the standard way, i.e., using speech and textual transcriptions. Recently, several methods have been proposed to learn speech representations using images, i.e., using visual grounding. Existing studies have focused on scene images. Here, we investigate whether fine...conference paper 2021
- document
-
Tian, Tian (author)Visually grounded speech representation learning has shown to be useful in the field of speech representation learning. Studies of learning visually grounded speech embedding adopted speech-image cross-modal retrieval task to evaluate the models, since the cross-modal retrieval task allows to jointly learn both modalities and find their...master thesis 2020
- document
-
Scharenborg, O.E. (author), van der Gouw, Nikki (author), Larson, M.A. (author), Marchiori, Elena (author)In this paper, we investigate the connection between how people understand speech and how speech is understood by a deep neural network. A naïve, general feed-forward deep neural network was trained for the task of vowel/consonant classification. Subsequently, the representations of the speech signal in the different hidden layers of the DNN...conference paper 2019
- document
-
Scharenborg, O.E. (author)For most languages in the world and for speech that deviates from the standard pronunciation, not enough (annotated) speech data is available to train an automatic speech recognition (ASR) system. Moreover, human intervention is needed to adapt an ASR system to a new language or type of speech. Human listeners, on the other hand, are able to...conference paper 2019