End-to-end acoustic-articulatory dysarthric speech recognition leveraging large-scale pretrained acoustic features

Conference Paper (2025)
Author(s)

Zhengjun Yue (TU Delft - Multimedia Computing)

Yuanyuan Zhang (TU Delft - Multimedia Computing)

Multimedia Computing
DOI related publication
https://doi.org/10.1109/ICASSP49660.2025.10888412
More Info
expand_more
Publication Year
2025
Language
English
Multimedia Computing
Bibliographical Note
Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.@en
Pages (from-to)
1-5
ISBN (electronic)
979-8-3503-6874-1
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Automatic dysarthric speech recognition (ADSR) remains challenging due to the irregularities in speech caused by motor control impairments and the limited availability of dysarthric speech data. This paper explores the integration of articulatory features, captured using Electromagnetic Articulography (EMA), with both conventional acoustic features and those extracted from large-scale pretrained models including Whisper and XLSR-53 as well as the fine-tuned Whisper model. We propose end-to-end (E2E) Conformer-based acoustic-articulatory models for ADSR and compare their performance against the corresponding hybrid TDNNF models. The experimental results show that using the fine-tuned Whisper features (Whisper-FT) fused with articulatory features achieves the lowest (10.5%) word error rate (WER) on dysarthric speech, with particularly significant improvements for severely dysarthric speech, reaching a WER of 20.8%.

Files

License info not available
warning

File under embargo until 15-09-2025