Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra

None, None; None, None; None, None

Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra

Journal Article (2023)

Author(s)

Zhengjun Yue (TU Delft - Multimedia Computing, King’s College London)

Erfan Loweimi (King’s College London, University of Cambridge)

Zoran Cvetkovic (King’s College London)

Multimedia Computing

Copyright

DOI related publication

https://doi.org/10.21437/Interspeech.2023-222

Dysarthric speech processing Raw phase and magnitude spectra Single- and multi-stream acoustic modelling

To reference this document use:

https://resolver.tudelft.nl/uuid:3dd58d3b-4b5a-4eca-aedd-72bc523a2979

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Multimedia Computing

Volume number

2023-August

Pages (from-to)

1533-1537

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In this paper, we explore the effectiveness of deploying the raw phase and magnitude spectra for dysarthric speech recognition, detection and classification. In particular, we scrutinise the usefulness of various raw phase-based representations along with their combinations with the raw magnitude spectrum and filterbank features. We employed single and multi-stream architectures consisting of a cascade of convolutional, recurrent and fully-connected layers for acoustic modelling. Furthermore, we investigate various configurations and fusion schemes as well as their training dynamics. In addition, the accuracies of the raw phase and magnitude based systems in the detection and classification tasks are studied and discussed. We report the performance on the UASpeech and TORGO dysarthric speech databases and for different severity levels. Our best system achieved WERs of 31.2% and 9.1% for dysarthric and typical speech on TORGO and 30.2% on UASpeech, respectively.

Files

Yue23_interspeech.pdf

(pdf | 0.364 Mb)

License info not available