Improving the Performance of Automatic Speech Recognition for Children with Developmental Language Disorders

Master Thesis (2025)
Author(s)

X. Wan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

O.E. Scharenborg – Mentor (TU Delft - Multimedia Computing)

J. Sun – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

T.J. Viering – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Z. Yue – Mentor (TU Delft - Multimedia Computing)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
03-11-2025
Awarding Institution
Delft University of Technology
Programme
['Computer Science | Multimedia Computing', 'Computer Science | Artificial Intelligence']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Automatic Speech Recognition (ASR) systems perform well for typical adult speech but remain challenged by children’s speech, especially that of children with Developmental Language Disorder (DLD). This study investigates how ASR performance can be enhanced for DLD speech while maintaining accuracy on typical child speech. Two state-of-the-art ASR models, a conformer-based model and Whisper-large-v3, were evaluated using Dutch typical (Jasmin) and atypical (Auris) child speech. The experiments examine data augmentation methods, including speed perturbation and vocal tract length perturbation, and transfer learning through fine-tuning. Results show that both techniques improve DLD speech recognition without degrading typical speech accuracy. The best performance was achieved by combining augmentation and fine-tuning with domain-matched DLD data, reaching 53.2% WER on the Auris test set, while mismatched fine-tuning reduced gains, particularly for Whisper. Overall, the findings demonstrate that integrating data augmentation and fine-tuning offers an effective, balanced approach toward inclusive and robust ASR for children with DLD.

Files

License info not available