Google Chirp vs. Whisper: Evaluating ASR performance on Dutch Native vs. Non-Native Teenager Speech

Bachelor Thesis (2024)
Author(s)

A.S.H. Jaggoe (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

O.E. Scharenborg – Mentor (TU Delft - Multimedia Computing)

Y. Zhang – Mentor (TU Delft - Multimedia Computing)

Catherine Oertel – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
25-06-2024
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Automatic Speech Recognition (ASR) systems have become increasingly important for society, yet their performance varies significantly across different diverse speaker groups. With a significant non-native population in the Netherlands, it is crucial that ASR systems accurately recognize diverse speech. Commercial state-of-the-art ASR systems are yet under-explored in their performance on Dutch diverse speech. This study evaluates the performance of two recently developed and affordable ASR systems, Google Chirp and OpenAI's Whisper, on speech from native and non-native Dutch teenagers. This research evaluates the recognition accuracy of these ASR systems and identifies common transcription errors. The results show slightly worse performance compared to previous research on non-native speech, and Whisper performing generally better than Google Chirp on the speaker groups.

Files

CSE3000_Final_Paper.pdf
(pdf | 0.138 Mb)
License info not available