Reducing Bias in State-of-the-Art ASR Systems for Child Speech

Addressing Age and Gender Disparities through Transfer Learning Strategies

More Info
expand_more

Abstract

Automatic Speech Recognition (ASR) systems have transformed human-machine interaction, yet they often struggle with child speech due to the unique vocal characteristics. This thesis investigates age and gender biases, focusing on enhancing the performance of state-of-the-art ASR model Whisper on child speech. Initial experiments reveal significant disparities in recognition accuracy across age groups and genders within child speech, highlighting the critical need for targeted improvements. The study uses Low-Rank Adaptation (LoRA) to finetune the model using four child-specific datasets, aiming to simultaneously enhance recognition performance and mitigate biases. Results demonstrate substantial reductions in Word Error Rates (WER) and biases after finetuning, showcasing the effectiveness of transfer learning in addressing demographic inequality. Gender biases decreased by 32.77% relative to their initial values, and age biases also improved, with a relative decrease of 27.52% after finetuning. This research showcases the potential of tailored approaches to advance ASR technology for low-resource user demographics, with implications for improving educational and assistive technologies.

Index Terms: Automatic Speech Recognition, Child speech, Whisper ASR model, Age and gender biases, Low-Rank Adaptation, Transfer learning, Demographic disparities