The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition

Journal Article (2022)
Author(s)

Luke Prananta (Student TU Delft)

Bence Mark Halpern (Nederlands Kanker Instituut - Antoni van Leeuwenhoek ziekenhuis, TU Delft - Electrical Engineering, Mathematics and Computer Science, Universiteit van Amsterdam)

Siyuan Feng (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Odette Scharenborg (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group
Multimedia Computing
DOI related publication
https://doi.org/10.21437/Interspeech.2022-190 Final published version
More Info
expand_more
Publication Year
2022
Language
English
Research Group
Multimedia Computing
Volume number
2022-September
Pages (from-to)
36-40
Event
23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 (2022-09-18 - 2022-09-22), Incheon, Korea, Republic of
Downloads counter
317
Collections
Institutional Repository
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition. We compare key components of existing methods as part of a rigorous ablation study to find the most effective solution to improve dysarthric speech recognition. We find that straightforward signal processing methods such as stationary noise removal and vocoder-based time stretching lead to dysarthric speech recognition results comparable to those obtained when using state-of-the-art GAN-based voice conversion methods as measured using a phoneme recognition task. Additionally, our proposed solution of a combination of MaskCycleGAN-VC and time stretching is able to improve the phoneme recognition results for certain dysarthric speakers compared to our time stretched baseline.

Files

Prananta22_interspeech.pdf
(pdf | 4.33 Mb)
- Embargo expired in 01-07-2023
License info not available