Evaluating Alternative Metrics for Dysarthric Speech Recognition

Assessing the Effectiveness of Different Evaluation Metrics in Dysarthric Speech Recognition Systems Across Various Severities

More Info
expand_more

Abstract

Dysarthria is a motor speech disorder resulting in slurred or slow speech that can be difficult to understand. This re- search paper evaluates the effectiveness of various metrics for automatic speech recognition (ASR), such as character error rate (CER), Jaro-Winkler distance, and BERTscore, in assessing performance specifically for dysarthric speech, which is often inadequately measured by the commonly used word error rate (WER). Using the TORGO database, which includes a range of dysarthria severities, we analyze the performance of chosen evaluation metrics with the Whisper and wav2vec 2.0 ASR systems to understand how they reflect the true speech recognition challenges presented by such atypical speech pat- terns. Our findings reveal that Whisper generally outperforms wav2vec 2.0, particularly in sentence utterances, by effectively managing complex speech patterns and maintaining semantic integrity. The analysis highlights that single-word utterances strongly correlate with dysarthria severity, while sentence utterances show a lesser correlation due to the mitigating effect of additional linguistic context.

Files