Exploring the Relationship Between Bias and Speech Acoustics in Automatic Speech Recognition Systems

An Experimental Investigation Using Acoustic Embeddings and Bias Metrics on a Dataset of Spoken Dutch

More Info
expand_more

Abstract

Automatic Speech Recognition (ASR) systems have become an integral part of daily lives. Despite their widespread use, these systems can exhibit biases that express themselves in the differences in their accuracy and performance across different demographic groups. Methods quantifying these biases have been developed. This paper investigates the relationship between bias and the acoustic characteristics of speakers. By examining various acoustic embeddings, derived from models like wav2vec 2.0 and XLSR, we aim to identify which embeddings correlate most strongly with bias. The findings offer insights into improving the fairness of ASRs by exploring how acoustic features influence bias in ASR systems. Future research directions include exploring isolated speech properties and extending the study to diverse linguistic contexts to deepen understanding in this area.