Comparing performance of ASR systems on native Dutch children and teenagers: Google vs. Microsoft

Evaluating Speech Recognition Accuracy of state-of-the-art ASR models on Dutch child and teenager speech

More Info
expand_more

Abstract

Automatic Speech Recognition (ASR) technology is becoming more and more useful in everyday life, therefor also requiring higher accuracy across all different user demographics. This study compares the performance of Google's and Microsoft's ASR systems on native Dutch child and teenager speech using the JASMIN-CGN dataset as ASR for children presents unique challenges due to their shorter vocal tracts and irregular speech patterns. This research evaluates each system's performance based on Word Error Rate (WER) and Character Error Rate (CER), highlighting the differences between gender, age, and dialect regions. The results indicate that while Microsoft's ASR consistently outperforms Google's in terms of WER, Google demonstrates slightly higher precision in terms of CER. Therefor Microsoft is considered the better overall performing system but depending on one's needs, such as precision, Google would be the more favorable one.