Comparing performance of ASR systems on native Dutch children and teenagers: Google vs. Microsoft
Evaluating Speech Recognition Accuracy of state-of-the-art ASR models on Dutch child and teenager speech
G. van Dijk (TU Delft - Electrical Engineering, Mathematics and Computer Science)
O.E. (Odette) Scharenborg – Mentor (TU Delft - Multimedia Computing)
Yuanyuan Zhang – Mentor (TU Delft - Multimedia Computing)
Catherine Oertel – Graduation committee member (TU Delft - Interactive Intelligence)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Automatic Speech Recognition (ASR) technology is becoming more and more useful in everyday life, therefor also requiring higher accuracy across all different user demographics. This study compares the performance of Google's and Microsoft's ASR systems on native Dutch child and teenager speech using the JASMIN-CGN dataset as ASR for children presents unique challenges due to their shorter vocal tracts and irregular speech patterns. This research evaluates each system's performance based on Word Error Rate (WER) and Character Error Rate (CER), highlighting the differences between gender, age, and dialect regions. The results indicate that while Microsoft's ASR consistently outperforms Google's in terms of WER, Google demonstrates slightly higher precision in terms of CER. Therefor Microsoft is considered the better overall performing system but depending on one's needs, such as precision, Google would be the more favorable one.