Google Chirp vs. Whisper: Evaluating ASR performance on Dutch Native vs. Non-Native Teenager Speech

Bachelor thesis (2024)

Authors

A.S.H. Jaggoe Electrical Engineering, Mathematics and Computer Science

Contributors

O.E. Scharenborg (mentor)

Y. Zhang (mentor)

Catharine Oertel Interactive Intelligence - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:53f44557-630c-40a1-87da-950c191a252b

Published Date

25-06-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Automatic Speech Recognition (ASR) systems have become increasingly important for society, yet their performance varies significantly across different diverse speaker groups. With a significant non-native population in the Netherlands, it is crucial that ASR systems accurately recognize diverse speech. Commercial state-of-the-art ASR systems are yet under-explored in their performance on Dutch diverse speech. This study evaluates the performance of two recently developed and affordable ASR systems, Google Chirp and OpenAI's Whisper, on speech from native and non-native Dutch teenagers. This research evaluates the recognition accuracy of these ASR systems and identifies common transcription errors. The results show slightly worse performance compared to previous research on non-native speech, and Whisper performing generally better than Google Chirp on the speaker groups.

Files

CSE3000_Final_Paper.pdf

(.pdf | 0.138 Mb)