Automatic Dysarthria Severity Assessment using Whisper-extracted Features
Evaluating ML architectures for dysarthria severity assessment on TORGO and MSDM
C. Charlesworth (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Z. Yue – Mentor (TU Delft - Multimedia Computing)
Y. Zhang – Mentor (TU Delft - Multimedia Computing)
Thomas Durieux – Graduation committee member (TU Delft - Software Engineering)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Dysarthria is a speech disorder commonly caused by neurological disorders such as strokes, cerebral palsy and Amyotrophic Lateral Sclerosis (ALS). The severity level of dysarthria greatly influences the appropriate treatment for a patient. However, assessing the severity of dysarthria in a patient is a time-consuming process that requires a trained speech therapist. Therefore the following work explores a variety of classifier architectures for automatic dysarthria severity assessment using Whisper encodings. The datasets used were MSDM and TORGO while the classifier architectures implemented included a Convolutional Neural Networks and Recurrent Neural Network variants. Across both datasets, the Gated Recurrent Unit network (GRU) achieved the best performance with 97.21% accuracy on MSDM and 97.47% on TORGO.