Comparing performance of ASR systems on native Dutch children and teenagers: Google vs. Microsoft

None, None

Comparing performance of ASR systems on native Dutch children and teenagers: Google vs. Microsoft

Evaluating Speech Recognition Accuracy of state-of-the-art ASR models on Dutch child and teenager speech

Bachelor Thesis (2024)

Author(s)

G. van Dijk (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

O.E. Scharenborg – Mentor (TU Delft - Multimedia Computing)

Y. Zhang – Mentor (TU Delft - Multimedia Computing)

Catherine Oertel – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty

Electrical Engineering, Mathematics and Computer Science

Google Dutch Automatic speech recognition Microsoft Child speech Teenager spech

To reference this document use:

https://resolver.tudelft.nl/uuid:0f2f8e50-7901-49c2-9122-e6d77ced9653

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

25-06-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Automatic Speech Recognition (ASR) technology is becoming more and more useful in everyday life, therefor also requiring higher accuracy across all different user demographics. This study compares the performance of Google's and Microsoft's ASR systems on native Dutch child and teenager speech using the JASMIN-CGN dataset as ASR for children presents unique challenges due to their shorter vocal tracts and irregular speech patterns. This research evaluates each system's performance based on Word Error Rate (WER) and Character Error Rate (CER), highlighting the differences between gender, age, and dialect regions. The results indicate that while Microsoft's ASR consistently outperforms Google's in terms of WER, Google demonstrates slightly higher precision in terms of CER. Therefor Microsoft is considered the better overall performing system but depending on one's needs, such as precision, Google would be the more favorable one.

Files

Research_project_paper.pdf

(pdf | 0.297 Mb)

License info not available