Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra

Journal Article (2023)
Author(s)

Zhengjun Yue (TU Delft - Multimedia Computing, King’s College London)

Erfan Loweimi (King’s College London, University of Cambridge)

Zoran Cvetkovic (King’s College London)

Multimedia Computing
Copyright
© 2023 Z. Yue, Erfan Loweimi, Zoran Cvetkovic
DOI related publication
https://doi.org/10.21437/Interspeech.2023-222
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Z. Yue, Erfan Loweimi, Zoran Cvetkovic
Multimedia Computing
Volume number
2023-August
Pages (from-to)
1533-1537
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In this paper, we explore the effectiveness of deploying the raw phase and magnitude spectra for dysarthric speech recognition, detection and classification. In particular, we scrutinise the usefulness of various raw phase-based representations along with their combinations with the raw magnitude spectrum and filterbank features. We employed single and multi-stream architectures consisting of a cascade of convolutional, recurrent and fully-connected layers for acoustic modelling. Furthermore, we investigate various configurations and fusion schemes as well as their training dynamics. In addition, the accuracies of the raw phase and magnitude based systems in the detection and classification tasks are studied and discussed. We report the performance on the UASpeech and TORGO dysarthric speech databases and for different severity levels. Our best system achieved WERs of 31.2% and 9.1% for dysarthric and typical speech on TORGO and 30.2% on UASpeech, respectively.

Files

Yue23_interspeech.pdf
(pdf | 0.364 Mb)
License info not available