Objective and Subjective Evaluation of Diffusion-Based Speech Enhancement for Dysarthric Speech

None, None; None, None; None, None; None, None; None, None

Objective and Subjective Evaluation of Diffusion-Based Speech Enhancement for Dysarthric Speech

Conference Paper (2025)

Author(s)

Dimme de Groot (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Tanvina Patel (Erasmus MC, TU Delft - Electrical Engineering, Mathematics and Computer Science)

Devendra Kayande (Indian Institute of Information Technology, Allahabad, TU Delft - Electrical Engineering, Mathematics and Computer Science)

Odette Scharenborg (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Zhengjun Yue (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group

Multimedia Computing

Speech enhancement Automatic speech recognition Dysarthric speech Diffusion models

DOI related publication

https://doi.org/10.21437/Interspeech.2025-2768 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:d508445f-8e8b-4966-8c24-e2b70565f09f

More Info

expand_more

Publication Year

2025

Language

English

Research Group

Multimedia Computing

Journal title

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Pages (from-to)

2740-2744

Event

26th Interspeech Conference 2025 (2025-08-17 - 2025-08-21), Rotterdam, Netherlands

Downloads counter

88

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Dysarthric speech poses significant challenges for automatic speech recognition (ASR) systems due to its high variability and reduced intelligibility. In this work we explore the use of diffusion models for dysarthric speech enhancement, which is based on the hypothesis that using diffusion-based speech enhancement moves the distribution of dysarthric speech closer to that of typical speech, which could potentially improve dysarthric speech recognition performance. We assess the effect of two diffusion-based and one signal-processing-based speech enhancement algorithms on intelligibility and speech quality of two English dysarthric speech corpora. We applied speech enhancement to both typical and dysarthric speech and evaluate the ASR performance using Whisper-Turbo, and the subjective and objective speech quality of the original and enhanced dysarthric speech. We also fine-tuned Whisper-Turbo on the enhanced speech to assess its impact on recognition performance.

Files

Degroot25_interspeech.pdf

(pdf | 0.52 Mb)