The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition

None, None; None, None; None, None; None, None

The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition

Journal Article (2022)

Author(s)

Bence M. Halpern (Nederlands Kanker Instituut - Antoni van Leeuwenhoek ziekenhuis, TU Delft - Multimedia Computing, Universiteit van Amsterdam)

Siyuan Feng (TU Delft - Multimedia Computing)

Odette Scharenborg (TU Delft - Multimedia Computing)

Multimedia Computing

Copyright

DOI related publication

https://doi.org/10.21437/Interspeech.2022-190

Voice conversion Generative adversarial networks Dysarthric speech recognition Time stretching

To reference this document use:

https://resolver.tudelft.nl/uuid:9d9665c2-cbc6-4779-b9ec-d1f424580165

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Multimedia Computing

Volume number

2022-September

Pages (from-to)

36-40

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition. We compare key components of existing methods as part of a rigorous ablation study to find the most effective solution to improve dysarthric speech recognition. We find that straightforward signal processing methods such as stationary noise removal and vocoder-based time stretching lead to dysarthric speech recognition results comparable to those obtained when using state-of-the-art GAN-based voice conversion methods as measured using a phoneme recognition task. Additionally, our proposed solution of a combination of MaskCycleGAN-VC and time stretching is able to improve the phoneme recognition results for certain dysarthric speakers compared to our time stretched baseline.

Files

Prananta22_interspeech.pdf

(pdf | 4.33 Mb)

- Embargo expired in 01-07-2023

License info not available