Search results | TU Delft Repositories

Searched for: +

(1 - 5 of 5)

document: The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition
Wang, Zhe (author), Wu, Shilong (author), Chen, Hang (author), He, Mao-Kui (author), Du, Jun (author), Lee, Chin-Hui (author), Chen, Jingdong (author), Watanabe, Shinji (author), Siniscalchi, Sabato Marco (author), Scharenborg, O.E. (author), Liu, Diyuan (author)
The Multi-modal Information based Speech Processing (MISP) challenge aims to extend the application of signal processing technology in specific scenarios by promoting the research into wake-up words, speaker diarization, speech recognition, and other technologies. The MISP2022 challenge has two tracks: 1) audio-visual speaker diarization (AVSD),...
conference paper 2023

document: Mitigating bias against non-native accents
Zhang, Y. (author), Zhang, Yixuan (author), Halpern, B.M. (author), Patel, T.B. (author), Scharenborg, O.E. (author)
Automatic speech recognition (ASR) systems have seen substantial improvements in the past decade; however, not for all speaker groups. Recent research shows that bias exists against different types of speech, including non-native accents, in state-of-the-art (SOTA) ASR systems. To attain inclusive speech recognition, i.e., ASR for everyone...
journal article 2022

document: Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis
Chen, Hang (author), Du, Jun (author), Dai, Yusheng (author), Lee, Chin Hui (author), Siniscalchi, Sabato Marco (author), Watanabe, Shinji (author), Scharenborg, O.E. (author), Chen, Jingdong (author), Yin, Bao Cai (author), Pan, Jia (author)
In this paper, we present the updated Audio-Visual Speech Recognition (AVSR) corpus of MISP2021 challenge, a large-scale audio-visual Chinese conversational corpus consisting of 141h audio and video data collected by far/middle/near microphones and far/middle cameras in 34 real-home TV rooms. To our best knowledge, our corpus is the first...
journal article 2022

document: Deriving content-specific measures of room acoustic perception using a binaural, nonlinear auditory model
Van Dorp Schuitman, J. (author), De Vries, D. (author), Lindau, A. (author)
Acousticians generally assess the acoustic qualities of a concert hall or any other room using impulse response-based measures such as the reverberation time, clarity index, and others. These parameters are used to predict perceptual attributes related to the acoustic qualities of the room. Various studies show that these physical measures are...
journal article 2013

document: An audio-visual corpus for multimodal speech recognition in Dutch language
Wojdel, J. (author), Wiggers, P. (author), Rothkrantz, L.J.M. (author)
This paper describes the gathering and availability of an audio-visual speech corpus for Dutch language. The corpus was prepared with the multi-modal speech recognition in mind and it is currently used in our research on lip-reading and bimodal speech recognition. It contains the prompts used also in the well-established POLYPHONE corpus and...
conference paper 2002

Searched for: +

(1 - 5 of 5)