Searched for: +
(1 - 4 of 4)
document
Wang, Zhe (author), Wu, Shilong (author), Chen, Hang (author), He, Mao-Kui (author), Du, Jun (author), Lee, Chin-Hui (author), Chen, Jingdong (author), Watanabe, Shinji (author), Siniscalchi, Sabato Marco (author), Scharenborg, O.E. (author), Liu, Diyuan (author)
The Multi-modal Information based Speech Processing (MISP) challenge aims to extend the application of signal processing technology in specific scenarios by promoting the research into wake-up words, speaker diarization, speech recognition, and other technologies. The MISP2022 challenge has two tracks: 1) audio-visual speaker diarization (AVSD),...
conference paper 2023
document
Wang, X. (author), Feng, S. (author), Zhu, Jihua (author), Hasegawa-Johnson, Mark (author), Scharenborg, O.E. (author)
This paper proposes a new model, referred to as the show and speak (SAS) model that, for the first time, is able to directly synthesize spoken descriptions of images, bypassing the need for any text or phonemes. The basic structure of SAS is an encoder-decoder architecture that takes an image as input and predicts the spectrogram of speech that...
conference paper 2021
document
Wang, X. (author), Tian, Tian (author), Zhu, Jihua (author), Scharenborg, O.E. (author)
In the case of unwritten languages, acoustic models cannot be trained in the standard way, i.e., using speech and textual transcriptions. Recently, several methods have been proposed to learn speech representations using images, i.e., using visual grounding. Existing studies have focused on scene images. Here, we investigate whether fine...
conference paper 2021
document
Wang, X. (author), Qiao, T. (author), Zhu, Jihua (author), Hanjalic, A. (author), Scharenborg, O.E. (author)
An estimated half of the world’s languages do not have a written form, making it impossible for these languages to benefit from any existing text-based technologies. In this paper, a speech-to-image generation (S2IG) framework is proposed which translates speech descriptions to photo-realistic images without using any text information, thus...
conference paper 2020
Searched for: +
(1 - 4 of 4)