Comparing and Analyzing Different Speech Conversion Techniques for Transforming Dysarthric to Normal Speech

None, None

Comparing and Analyzing Different Speech Conversion Techniques for Transforming Dysarthric to Normal Speech

Master Thesis (2024)

Author(s)

J. Liu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

O.E. Scharenborg – Mentor (TU Delft - Multimedia Computing)

Qun Song – Graduation committee member (TU Delft - Embedded Systems)

Zhengjun Yue – Graduation committee member (TU Delft - Multimedia Computing)

Faculty

Electrical Engineering, Mathematics and Computer Science

Intelligibility Naturalness Dysarthric speech recognition Voice conversion

To reference this document use:

https://resolver.tudelft.nl/uuid:f25eb395-75ea-43dc-a9b9-4646ef077eab

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

29-05-2024

Awarding Institution

Delft University of Technology

Programme

Electrical Engineering | Embedded Systems

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Dysarthric speech, characterized by articulation problems and a slower speech rate, shows lower automatic speech recognition (ASR) performance compared to normal speech. To improve performance, researchers often try to enhance dysarthric speech to be more like normal speech before passing it through an ASR trained on normal speech. In this project, we compare different signal processing and voice conversion techniques for dysarthric-to-normal speech enhancement. The resulting enhanced speech is objectively evaluated using an ASR system trained on normal speech. Also, the naturalness and intelligibility of the enhanced dysarthric speech are evaluated through listening experiments. Finally, the correlation between subjective and objective evaluations was analyzed. We found that among the techniques investigated, time-stretching demonstrated superior performance in objective evaluation experiments, surpassing state-of-the-art voice conversion methods. Across all methods, improvements in naturalness and intelligibility were positively correlated with improvements in automatic speech recognition (ASR) performance. However, this correlation was significant for some methods but not for others.

Files

TUD_Msc_Thesis_Jingxian.pdf

(pdf | 1.41 Mb)

- Embargo expired in 01-10-2024

License info not available