Bias Mitigation Against Non-native Speakers in Dutch ASR

None, None

Bias Mitigation Against Non-native Speakers in Dutch ASR

Master Thesis (2022)

Author(s)

Y. Zhang (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

O.E. Scharenborg – Mentor (TU Delft - Multimedia Computing)

T.B. Patel – Graduation committee member (TU Delft - Multimedia Computing)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Transfer learning Bias Automatic speech recognition Data augmentation

To reference this document use:

https://resolver.tudelft.nl/uuid:df87fbca-7e88-4ea8-858b-3b8f4a194c87

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Graduation Date

08-07-2022

Awarding Institution

Delft University of Technology

Programme

['Computer Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

One of the most important problems that needs tackling for wide deployment of Automatic Speech Recognition (ASR) is the bias in ASR, i.e., ASRs tend to generate more accurate predictions for certain speaker groups while making more errors on speech from others. In this thesis, we aim to reduce bias against non-native speakers of Dutch compared to native Dutch speakers. Typically, an important source of bias is insufficient training data. We therefore investigate employing three different data augmentation techniques to increase the amount of non-native accented Dutch training data, i.e., speed and volume perturbation and pitch shift, and using these for two transfer learning techniques: model fine-tuning and multi-task learning, to reduce bias in a state-of-the-art hybrid HMM-DNN Kaldi-based ASR system. Experimental results on read speech and human-computer interaction (HMI) speech showed that although individual data augmentation techniques did not always yield an improved recognition performance, the combination of all three data augmentation techniques did. Importantly, bias was reduced by more than 18% absolute compared to the baseline system for read speech when applying pitch shift data augmentation and multi-task training, and by more than 7% for HMI speech when applying all three data augmentation techniques during fine-tuning, while improving recognition accuracy of both the native and non-native Dutch speech.

Files

MasterThesis_Yixuan.pdf

(pdf | 2.24 Mb)

License info not available