Print Email Facebook Twitter Mitigating Regional Accent Bias in ASR Systems Title Mitigating Regional Accent Bias in ASR Systems Author Li, Zirui (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Scharenborg, O.E. (mentor) Zuniga, Marco (graduation committee) Degree granting institution Delft University of Technology Programme Electrical Engineering | Embedded Systems Date 2023-07-11 Abstract End-to-end Automatic Speech Recognition (ASR) systems improved drastically in recent years and they work extremely well on many large datasets. However, research shows that these models failed to capture the variability in speech production and have biases against the variant caused by the regional accented speech. Moreover, ASR research on regional accents is primarily done in languages used by a large population, like English and Arabic, and the effect of regional accented speech on E2E ASR systems in non-popular languages is still unknown. It is important to know the effect of regional accented speech on E2E ASR systems as it helps researchers to build an inclusive E2E ASR system. In this project, I aim to mitigate the biases against regional accented speech. I select standard speech and regional accented speech from CommonVoice's French and German datasets. I combine the state-of-the-art Conformer Recurrent Neural Network Transducer model with Multi-Domain Adversarial Training (MDAT) to boost the performance of regional accented speech while not hurting the performance of the standard speech. Moreover, since the regional accented speech is typically low-resourced, I study the amount of data required for effective MDAT, as well as the effect of different domain classifiers on the performance of Multi-Domain Adversarial Training. Experimental results show that MDAT can mitigate the biases against regional accented speech in both French and German. The best model in French reduces the bias by around 12% and the best model in German reduces the bias by around 7%. Additionally, MDAT is an effective method for bias mitigation as it can achieve similar performance as the MDAT model trained with the full dataset using only a small amount (e.g. 30 minutes) of untranscribed regional accented speech. Finally, different domain classifier architectures were found to have similar effects on the results of MDAT, thus there is no significant differences among the domain classifier in this project. Subject bias mitigationautomatic speech recognitionregional accented speechdomain adversarial training To reference this document use: http://resolver.tudelft.nl/uuid:49346f35-e8eb-4f9d-870a-6ed44755f6be Part of collection Student theses Document type master thesis Rights © 2023 Zirui Li Files PDF Mitigating_Regional_Accen ... ystems.pdf 1.81 MB Close viewer /islandora/object/uuid:49346f35-e8eb-4f9d-870a-6ed44755f6be/datastream/OBJ/view