Evaluating Alternative Metrics for Dysarthric Speech Recognition

None, None

Evaluating Alternative Metrics for Dysarthric Speech Recognition

Assessing the Effectiveness of Different Evaluation Metrics in Dysarthric Speech Recognition Systems Across Various Severities

Bachelor Thesis (2024)

Author(s)

H.C. Nguyen Duc (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Z. Yue – Mentor (TU Delft - Multimedia Computing)

Yuanyuan Zhang – Mentor (TU Delft - Multimedia Computing)

Thomas Durieux – Coach (TU Delft - Software Engineering)

Faculty

Electrical Engineering, Mathematics and Computer Science

Evaluation metrics Automatic Speech Recognition Dysarthria Dysarthric speech recognition

To reference this document use:

https://resolver.tudelft.nl/uuid:a78e98fd-b467-4903-b10b-d76d14c90cc7

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

25-06-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Dysarthria is a motor speech disorder resulting in slurred or slow speech that can be difficult to understand. This re- search paper evaluates the effectiveness of various metrics for automatic speech recognition (ASR), such as character error rate (CER), Jaro-Winkler distance, and BERTscore, in assessing performance specifically for dysarthric speech, which is often inadequately measured by the commonly used word error rate (WER). Using the TORGO database, which includes a range of dysarthria severities, we analyze the performance of chosen evaluation metrics with the Whisper and wav2vec 2.0 ASR systems to understand how they reflect the true speech recognition challenges presented by such atypical speech pat- terns. Our findings reveal that Whisper generally outperforms wav2vec 2.0, particularly in sentence utterances, by effectively managing complex speech patterns and maintaining semantic integrity. The analysis highlights that single-word utterances strongly correlate with dysarthria severity, while sentence utterances show a lesser correlation due to the mitigating effect of additional linguistic context.

Files

NguyenDuc_Final.pdf

(pdf | 0.443 Mb)

License info not available