Dysarthric Speech Recognition Fusing Large Pre-Trained Model Extracted Acoustic Features With Articulatory Data

Master Thesis (2025)
Author(s)

X. Xu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Zhengjun Yue – Mentor (TU Delft - Multimedia Computing)

O.E. Scharenborg – Mentor (TU Delft - Multimedia Computing)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
03-03-2025
Awarding Institution
Delft University of Technology
Programme
Computer Science
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Dysarthric speech recognition is challenging due to speech variability caused by neurological disorders. This study explores integrating articulatory features with large pre-trained acoustic model features (e.g., WavLM, Whisper) to improve recognition performance. Different fusion strategies, including concatenation and cross-attention mechanisms, are also compared in this work. Experimental results show that articulatory features can enhance WavLM-extracted features, reducing WER for moderate and mild severity level. t-SNE analysis reveal how articulatory features influence feature representations. These findings highlight the potential of multimodal fusion in improving dysarthric ASR systems.

Files

Xinrui_s_thesis_1_.pdf
(pdf | 1.7 Mb)
License info not available