- document
-
Dudzik, B.J.W. (author), Matej Hrkalovic, T. (author), Küster, Dennis (author), St-Onge, David (author), Putze, Felix (author), Devillers, Laurence (author)The ability to automatically infer relevant aspects of human users' thoughts and feelings is crucial for technologies to intelligently adapt their behaviors in complex interactions. Research on multimodal analysis has demonstrated the potential of technology to provide such estimates for a broad range of internal states and processes. However...conference paper 2023
- document
-
van der Heide, Niels (author)This study researches how station-based bike-sharing can be implemented in strategic transport models. Implementing station-based bike-sharing is a challenging topic, as it requires to combine two modelling techniques. The first is modelling transport tours to make sure that a person who rents a bike returns the bike to the same station. The...master thesis 2021
- document
-
Wang, X. (author), van der Hout, Justin (author), Zhu, Jihua (author), Hasegawa-Johnson, Mark (author), Scharenborg, O.E. (author)Image captioning technology has great potential in many scenarios. However, current text-based image captioning methods cannot be applied to approximately half of the world's languages due to these languages’ lack of a written form. To solve this problem, recently the image-to-speech task was proposed, which generates spoken descriptions of...journal article 2021
- document
-
Wang, X. (author), Tian, Tian (author), Zhu, Jihua (author), Scharenborg, O.E. (author)In the case of unwritten languages, acoustic models cannot be trained in the standard way, i.e., using speech and textual transcriptions. Recently, several methods have been proposed to learn speech representations using images, i.e., using visual grounding. Existing studies have focused on scene images. Here, we investigate whether fine...conference paper 2021
- document
-
Wang, X. (author), Qiao, T. (author), Zhu, Jihua (author), Hanjalic, A. (author), Scharenborg, O.E. (author)Text-based technologies, such as text translation from one language to another, and image captioning, are gaining popularity. However, approximately half of the world's languages are estimated to be lacking a commonly used written form. Consequently, these languages cannot benefit from text-based technologies. This paper presents 1) a new...journal article 2021
- document
-
Tian, Tian (author)Visually grounded speech representation learning has shown to be useful in the field of speech representation learning. Studies of learning visually grounded speech embedding adopted speech-image cross-modal retrieval task to evaluate the models, since the cross-modal retrieval task allows to jointly learn both modalities and find their...master thesis 2020
- document
-
Wang, X. (author), Qiao, T. (author), Zhu, Jihua (author), Hanjalic, A. (author), Scharenborg, O.E. (author)An estimated half of the world’s languages do not have a written form, making it impossible for these languages to benefit from any existing text-based technologies. In this paper, a speech-to-image generation (S2IG) framework is proposed which translates speech descriptions to photo-realistic images without using any text information, thus...conference paper 2020