Comparing and Analyzing Different Speech Conversion Techniques for Transforming Dysarthric to Normal Speech

Master Thesis (2024)
Author(s)

J. Liu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

O.E. Scharenborg – Mentor (TU Delft - Multimedia Computing)

Qun Song – Graduation committee member (TU Delft - Embedded Systems)

Zhengjun Yue – Graduation committee member (TU Delft - Multimedia Computing)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
29-05-2024
Awarding Institution
Delft University of Technology
Programme
Electrical Engineering | Embedded Systems
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Dysarthric speech, characterized by articulation problems and a slower speech rate, shows lower automatic speech recognition (ASR) performance compared to normal speech. To improve performance, researchers often try to enhance dysarthric speech to be more like normal speech before passing it through an ASR trained on normal speech. In this project, we compare different signal processing and voice conversion techniques for dysarthric-to-normal speech enhancement. The resulting enhanced speech is objectively evaluated using an ASR system trained on normal speech. Also, the naturalness and intelligibility of the enhanced dysarthric speech are evaluated through listening experiments. Finally, the correlation between subjective and objective evaluations was analyzed. We found that among the techniques investigated, time-stretching demonstrated superior performance in objective evaluation experiments, surpassing state-of-the-art voice conversion methods. Across all methods, improvements in naturalness and intelligibility were positively correlated with improvements in automatic speech recognition (ASR) performance. However, this correlation was significant for some methods but not for others.

Files

TUD_Msc_Thesis_Jingxian.pdf
(pdf | 1.41 Mb)
- Embargo expired in 01-10-2024
License info not available