Mitigating Regional Accent Bias in ASR Systems

Li, Zirui

Mitigating Regional Accent Bias in ASR Systems

Title

Mitigating Regional Accent Bias in ASR Systems

Author

Li, Zirui (TU Delft Electrical Engineering, Mathematics and Computer Science)

Contributor

Scharenborg, O.E. (mentor)
Zuniga, Marco (graduation committee)

Degree granting institution

Delft University of Technology

Programme

Electrical Engineering | Embedded Systems

Date

2023-07-11

Abstract

End-to-end Automatic Speech Recognition (ASR) systems improved drastically in recent years and they work extremely well on many large datasets. However, research shows that these models failed to capture the variability in speech production and have biases against the variant caused by the regional accented speech. Moreover, ASR research on regional accents is primarily done in languages used by a large population, like English and Arabic, and the effect of regional accented speech on E2E ASR systems in non-popular languages is still unknown. It is important to know the effect of regional accented speech on E2E ASR systems as it helps researchers to build an inclusive E2E ASR system. In this project, I aim to mitigate the biases against regional accented speech. I select standard speech and regional accented speech from CommonVoice's French and German datasets. I combine the state-of-the-art Conformer Recurrent Neural Network Transducer model with Multi-Domain Adversarial Training (MDAT) to boost the performance of regional accented speech while not hurting the performance of the standard speech. Moreover, since the regional accented speech is typically low-resourced, I study the amount of data required for effective MDAT, as well as the effect of different domain classifiers on the performance of Multi-Domain Adversarial Training. Experimental results show that MDAT can mitigate the biases against regional accented speech in both French and German. The best model in French reduces the bias by around 12% and the best model in German reduces the bias by around 7%. Additionally, MDAT is an effective method for bias mitigation as it can achieve similar performance as the MDAT model trained with the full dataset using only a small amount (e.g. 30 minutes) of untranscribed regional accented speech. Finally, different domain classifier architectures were found to have similar effects on the results of MDAT, thus there is no significant differences among the domain classifier in this project.

Subject

bias mitigation
automatic speech recognition
regional accented speech
domain adversarial training

To reference this document use:

http://resolver.tudelft.nl/uuid:49346f35-e8eb-4f9d-870a-6ed44755f6be

Part of collection

Student theses

Document type

master thesis

Rights

Files

PDF

Mitigating_Regional_Accen ... ystems.pdf

1.81 MB

Close viewer