Discovering Bias in Dutch Automatic Speech Recognition by Clustering Interpretable Acoustic and Prosodic Features

None, None

Discovering Bias in Dutch Automatic Speech Recognition by Clustering Interpretable Acoustic and Prosodic Features

Bachelor Thesis (2024)

Author(s)

K.M. Jones (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Odette Scharenborg – Mentor (TU Delft - Multimedia Computing)

Jorge Martinez – Mentor (TU Delft - Multimedia Computing)

N.M. Gürel – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Feature extraction Interpretability Speech recognition Fairness Bias

To reference this document use:

https://resolver.tudelft.nl/uuid:2b141b8a-4ef8-47b8-b43c-04ccb23cc83c

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

25-06-2024

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Dutch State-of-the-art Automatic Speech Recognition (ASR) systems do not perform equally well for different speaker groups. Existing metrics to quantify this bias rely on demographic metadata, which is often unavailable. Recent advances in the field use machine learning to find groups of similar speakers instead. However, its black-box nature obscures the interpretability of resulting groups. This paper proposes an interpretable approach to bias discovery by clustering speakers based on acoustic and prosodic features. Different feature subsets were compared in their ability to find performance disparities in five ASR systems for two separate speaking styles. Results show that these feature sets can uncover bias approaching known disparities between demographic groups. While the effectiveness per feature set differed between the speaking styles, the most successful ones found significant disparities between clusters with diverse demographic compositions.

Files

Kayleigh_Jones_BSc_Thesis_Inte... (pdf)

(pdf | 0.285 Mb)

License info not available