Discovering Bias in Dutch Automatic Speech Recognition by Clustering Interpretable Acoustic and Prosodic Features

More Info
expand_more

Abstract

Dutch State-of-the-art Automatic Speech Recognition (ASR) systems do not perform equally well for different speaker groups. Existing metrics to quantify this bias rely on demographic metadata, which is often unavailable. Recent advances in the field use machine learning to find groups of similar speakers instead. However, its black-box nature obscures the interpretability of resulting groups. This paper proposes an interpretable approach to bias discovery by clustering speakers based on acoustic and prosodic features. Different feature subsets were compared in their ability to find performance disparities in five ASR systems for two separate speaking styles. Results show that these feature sets can uncover bias approaching known disparities between demographic groups. While the effectiveness per feature set differed between the speaking styles, the most successful ones found significant disparities between clusters with diverse demographic compositions.