Automatic Speech Recognition (ASR) systems perform well for typical adult speech but remain challenged by children’s speech, especially that of children with Developmental Language Disorder (DLD). This study investigates how ASR performance can be enhanced for DLD speech while ma
...
Automatic Speech Recognition (ASR) systems perform well for typical adult speech but remain challenged by children’s speech, especially that of children with Developmental Language Disorder (DLD). This study investigates how ASR performance can be enhanced for DLD speech while maintaining accuracy on typical child speech. Two state-of-the-art ASR models, a conformer-based model and Whisper-large-v3, were evaluated using Dutch typical (Jasmin) and atypical (Auris) child speech. The experiments examine data augmentation methods, including speed perturbation and vocal tract length perturbation, and transfer learning through fine-tuning. Results show that both techniques improve DLD speech recognition without degrading typical speech accuracy. The best performance was achieved by combining augmentation and fine-tuning with domain-matched DLD data, reaching 53.2% WER on the Auris test set, while mismatched fine-tuning reduced gains, particularly for Whisper. Overall, the findings demonstrate that integrating data augmentation and fine-tuning offers an effective, balanced approach toward inclusive and robust ASR for children with DLD.