Improving the Performance of Automatic Speech Recognition for Children with Developmental Language Disorders
X. Wan (TU Delft - Electrical Engineering, Mathematics and Computer Science)
O.E. Scharenborg – Mentor (TU Delft - Multimedia Computing)
J. Sun – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
T.J. Viering – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
Z. Yue – Mentor (TU Delft - Multimedia Computing)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Automatic Speech Recognition (ASR) systems perform well for typical adult speech but remain challenged by children’s speech, especially that of children with Developmental Language Disorder (DLD). This study investigates how ASR performance can be enhanced for DLD speech while maintaining accuracy on typical child speech. Two state-of-the-art ASR models, a conformer-based model and Whisper-large-v3, were evaluated using Dutch typical (Jasmin) and atypical (Auris) child speech. The experiments examine data augmentation methods, including speed perturbation and vocal tract length perturbation, and transfer learning through fine-tuning. Results show that both techniques improve DLD speech recognition without degrading typical speech accuracy. The best performance was achieved by combining augmentation and fine-tuning with domain-matched DLD data, reaching 53.2% WER on the Auris test set, while mismatched fine-tuning reduced gains, particularly for Whisper. Overall, the findings demonstrate that integrating data augmentation and fine-tuning offers an effective, balanced approach toward inclusive and robust ASR for children with DLD.