How Good Are State-of-the-Art Automatic Speech Recognition Systems in Recognizing Dutch Diverse Speech?

None, None

How Good Are State-of-the-Art Automatic Speech Recognition Systems in Recognizing Dutch Diverse Speech?

An Evaluation of Meta MMS and OpenAI Whisper on Native and Non-Native Dutch Speech

Bachelor Thesis (2024)

Author(s)

Y. Chen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

O.E. Scharenborg – Mentor (TU Delft - Multimedia Computing)

Y. Zhang – Mentor (TU Delft - Multimedia Computing)

Catharine Oertel – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty

Electrical Engineering, Mathematics and Computer Science

ASR AI Dutch Speech Recognition

To reference this document use:

https://resolver.tudelft.nl/uuid:42aa22b5-75d8-49e1-b382-392da1bf70ae

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

25-06-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Automatic speech recognition (ASR) is increasingly used in daily applications, such as voice-activated virtual assistants like Siri and Alexa, real-time transcription for meetings and lectures, and voice commands for smart home devices. However, studies show that even state-of-the-art (SotA) ASR systems do not recognize the speech of everyone equally well.

To the best of my knowledge, this paper, for the first time, evaluates the performance of Meta's SotA ASR system, Massively Multilingual Speech (MMS), on Dutch native and non-native speech. Using the Jasmin Corpus dataset, which includes a diverse set of both native and non-native Dutch speakers, this study uses metrics such as word error rate (WER), character error rate (CER), and word information lost (WIL) to assess performance. Additionally, the same methodology is applied to the same data using OpenAI's ASR system, Whisper, to provide a comparative analysis.

The paper analyzes WER, CER, and WIL error metrics, processing time, and investigates the best-suited beam size for Whisper. It also lists out the types of errors made in terms of deletions, insertions, and substitutions made by each model across different age groups of Dutch speakers.

Files

Yiming_Delft_Style_final.pdf

(pdf | 0.19 Mb)

License info not available