Improving the Performance of Automatic Speech Recognition for Children with Developmental Language Disorders

None, None

Improving the Performance of Automatic Speech Recognition for Children with Developmental Language Disorders

Master Thesis (2025)

Author(s)

X. Wan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

O.E. Scharenborg – Mentor (TU Delft - Multimedia Computing)

J. Sun – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

T.J. Viering – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Z. Yue – Mentor (TU Delft - Multimedia Computing)

Faculty

Electrical Engineering, Mathematics and Computer Science

Data Augmentation Automatic Speech Recognition Whisper Model Fine-tuning Developmental Language Disorder Vocal Tract Length Perturbation Speed Perturbation Conformer Model Children’s Speech

To reference this document use:

https://resolver.tudelft.nl/uuid:92414361-94d5-4c5e-ae96-3f754ad95c7f

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

03-11-2025

Awarding Institution

Delft University of Technology

Programme

['Computer Science | Multimedia Computing', 'Computer Science | Artificial Intelligence']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Automatic Speech Recognition (ASR) systems perform well for typical adult speech but remain challenged by children’s speech, especially that of children with Developmental Language Disorder (DLD). This study investigates how ASR performance can be enhanced for DLD speech while maintaining accuracy on typical child speech. Two state-of-the-art ASR models, a conformer-based model and Whisper-large-v3, were evaluated using Dutch typical (Jasmin) and atypical (Auris) child speech. The experiments examine data augmentation methods, including speed perturbation and vocal tract length perturbation, and transfer learning through fine-tuning. Results show that both techniques improve DLD speech recognition without degrading typical speech accuracy. The best performance was achieved by combining augmentation and fine-tuning with domain-matched DLD data, reaching 53.2% WER on the Auris test set, while mismatched fine-tuning reduced gains, particularly for Whisper. Overall, the findings demonstrate that integrating data augmentation and fine-tuning offers an effective, balanced approach toward inclusive and robust ASR for children with DLD.

Files

Xin_Wan_Master_Thesis_Final.pd... (pdf)

(pdf | 0.261 Mb)

License info not available