Evaluating Alternative Metrics for Dysarthric Speech Recognition
Assessing the Effectiveness of Different Evaluation Metrics in Dysarthric Speech Recognition Systems Across Various Severities
H.C. Nguyen Duc (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Z. Yue – Mentor (TU Delft - Multimedia Computing)
Yuanyuan Zhang – Mentor (TU Delft - Multimedia Computing)
Thomas Durieux – Coach (TU Delft - Software Engineering)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Dysarthria is a motor speech disorder resulting in slurred or slow speech that can be difficult to understand. This re- search paper evaluates the effectiveness of various metrics for automatic speech recognition (ASR), such as character error rate (CER), Jaro-Winkler distance, and BERTscore, in assessing performance specifically for dysarthric speech, which is often inadequately measured by the commonly used word error rate (WER). Using the TORGO database, which includes a range of dysarthria severities, we analyze the performance of chosen evaluation metrics with the Whisper and wav2vec 2.0 ASR systems to understand how they reflect the true speech recognition challenges presented by such atypical speech pat- terns. Our findings reveal that Whisper generally outperforms wav2vec 2.0, particularly in sentence utterances, by effectively managing complex speech patterns and maintaining semantic integrity. The analysis highlights that single-word utterances strongly correlate with dysarthria severity, while sentence utterances show a lesser correlation due to the mitigating effect of additional linguistic context.