- document
-
Wang, X. (author), Tian, Tian (author), Zhu, Jihua (author), Scharenborg, O.E. (author)In the case of unwritten languages, acoustic models cannot be trained in the standard way, i.e., using speech and textual transcriptions. Recently, several methods have been proposed to learn speech representations using images, i.e., using visual grounding. Existing studies have focused on scene images. Here, we investigate whether fine...conference paper 2021
- document
-
Scholten, Sebastiaan (author), Merkx, Danny (author), Scharenborg, O.E. (author)We investigated word recognition in a Visually Grounded Speech model. The model has been trained on pairs of images and spoken captions to create visually grounded embeddings which can be used for speech to image retrieval and vice versa. We investigate whether such a model can be used to recognise words by embedding isolated words and using...conference paper 2021
- document
-
Tian, Tian (author)Visually grounded speech representation learning has shown to be useful in the field of speech representation learning. Studies of learning visually grounded speech embedding adopted speech-image cross-modal retrieval task to evaluate the models, since the cross-modal retrieval task allows to jointly learn both modalities and find their...master thesis 2020
- document
-
Scholten, J.S.M. (author)A Visually Grounded Speech model is a neural model which is trained to embed image caption pairs closely together in a common embedding space. As a result, such a model can retrieve semantically related images given a speech caption and vice versa. The purpose of this research is to investigate whether and how a Visually Grounded Speech model...master thesis 2020