Exploring the Relationship Between Bias and Speech Acoustics in Automatic Speech Recognition Systems
An Experimental Investigation Using Acoustic Embeddings and Bias Metrics on a Dataset of Spoken Dutch
P.P. Cichoń (TU Delft - Electrical Engineering, Mathematics and Computer Science)
O.E. Scharenborg – Mentor (TU Delft - Multimedia Computing)
Jorge Martinez – Mentor (TU Delft - Multimedia Computing)
N.M. Gürel – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Automatic Speech Recognition (ASR) systems have become an integral part of daily lives. Despite their widespread use, these systems can exhibit biases that express themselves in the differences in their accuracy and performance across different demographic groups. Methods quantifying these biases have been developed. This paper investigates the relationship between bias and the acoustic characteristics of speakers. By examining various acoustic embeddings, derived from models like wav2vec 2.0 and XLSR, we aim to identify which embeddings correlate most strongly with bias. The findings offer insights into improving the fairness of ASRs by exploring how acoustic features influence bias in ASR systems. Future research directions include exploring isolated speech properties and extending the study to diverse linguistic contexts to deepen understanding in this area.