Reducing Bias in State-of-the-Art ASR Systems for Child Speech

None, None

Reducing Bias in State-of-the-Art ASR Systems for Child Speech

Addressing Age and Gender Disparities through Transfer Learning Strategies

Bachelor Thesis (2024)

Author(s)

F.A. Zeisler (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Y. Zhang – Mentor (TU Delft - Multimedia Computing)

Z. Yue – Mentor (TU Delft - Multimedia Computing)

Thomas Durieux – Graduation committee member (TU Delft - Software Engineering)

Faculty

Electrical Engineering, Mathematics and Computer Science

Transfer Learning Automatic Speech Recognition Low-Rank Adaptation Whisper ASR Model Child Speech Age and gender bias Demographic Disparities

To reference this document use:

https://resolver.tudelft.nl/uuid:86f166d9-e13f-4084-a5bf-7a7632604b52

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

27-06-2024

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Automatic Speech Recognition (ASR) systems have transformed human-machine interaction, yet they often struggle with child speech due to the unique vocal characteristics. This thesis investigates age and gender biases, focusing on enhancing the performance of state-of-the-art ASR model Whisper on child speech. Initial experiments reveal significant disparities in recognition accuracy across age groups and genders within child speech, highlighting the critical need for targeted improvements. The study uses Low-Rank Adaptation (LoRA) to finetune the model using four child-specific datasets, aiming to simultaneously enhance recognition performance and mitigate biases. Results demonstrate substantial reductions in Word Error Rates (WER) and biases after finetuning, showcasing the effectiveness of transfer learning in addressing demographic inequality. Gender biases decreased by 32.77% relative to their initial values, and age biases also improved, with a relative decrease of 27.52% after finetuning. This research showcases the potential of tailored approaches to advance ASR technology for low-resource user demographics, with implications for improving educational and assistive technologies.

Index Terms: Automatic Speech Recognition, Child speech, Whisper ASR model, Age and gender biases, Low-Rank Adaptation, Transfer learning, Demographic disparities

Files

FZ_5029341_Final_Paper_v2.pdf

(pdf | 0.302 Mb)

License info not available