Exploring Data Augmentation in Bias Mitigation Against Non-Native-Accented Speech

None, None; None, None; None, None; None, None; None, None

Exploring Data Augmentation in Bias Mitigation Against Non-Native-Accented Speech

Conference Paper (2023)

Author(s)

YuanYuan Zhang (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Aaricia Herygers (External organisation)

Tanvina Patel (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Zhengjun Yue (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Odette Scharenborg (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group

Multimedia Computing

Speech recognition Data augmentation Voice conversion Bias mitigation Non-native accents

DOI related publication

https://doi.org/10.1109/ASRU57964.2023.10389756 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:a2c96cbd-32a9-47e6-96a8-4a216a843187

More Info

expand_more

Publication Year

2023

Language

English

Research Group

Multimedia Computing

ISBN (print)

979-8-3503-0690-3

ISBN (electronic)

979-8-3503-0689-7

Event

2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (2023-12-16 - 2023-12-20), Taipei, Taiwan

Downloads counter

263

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Automatic speech recognition (ASR) should serve every speaker, not only the majority “standard” speakers of a language. In order to build inclusive ASR, mitigating the bias against speaker groups who speak in a “non-standard” or “diverse” way is crucial. We aim to mitigate the bias against non-native-accented Flemish in a Flemish ASR system. Since this is a low-resource problem, we investigate the optimal type of data augmentation, i.e., speed/pitch perturbation, cross-lingual voice conversion-based methods, and SpecAugment, applied to both native Flemish and non-native-accented Flemish, for bias mitigation. The results showed that specific types of data augmentation applied to both native and non-native-accented speech improve non-native-accented ASR while applying data augmentation to the non-native-accented speech is more conducive to bias reduction. Combining both gave the largest bias reduction for human-machine interaction (HMI) as well as read-type speech.

Files

Exploring_Data_Augmentation_in... (pdf)

(pdf | 0.511 Mb)

- Embargo expired in 19-07-2024

License info not available