Improving State-of-the-Art ASR Systems for Speakers with Dysarthria
Applying Low-Rank Adaptation Transfer Learning to Whisper
M. Günther (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Z. Yue – Mentor (TU Delft - Multimedia Computing)
Y. Zhang – Mentor (TU Delft - Multimedia Computing)
Thomas Durieux – Graduation committee member (TU Delft - Software Engineering)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Dysarthria is a speech disorder that limits an individual’s ability to clearly articulate, due to the weakening of the muscles involved in speech. Despite recent advances in Automatic Speech Recognition (ASR), the recognition of dysarthric speech remains a significant challenge because of the limited availability of dysarthric speech data, significant speaker variability, and the mismatch between typical and dysarthric speech patterns. This paper addresses these challenges by using transfer learning and Low-Rank Adaptation (LoRA) techniques to enhance the performance of the state- of-the-art ASR model Whisper on dysarthric speech. By fine-tuning Whisper with the TORGO dataset, this study aims to adapt the pre-trained models to better recognise dysarthric speech patterns, thus reducing Word Error Rates (WER) and improving accessibility for individuals with speech impairments. Experimental results indicate that this approach can improve speech recognition performance since the Large- V2, Large-V3 and the corresponding distilled models achieved a reduction in WER after fine-tuning. The Large-V3 model achieved the greatest relative WER reduction of 22.65%.