TD
T.J. De Valck
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
Current state-of-the-art automatic speech recognition (ASR) systems recognize typical speech (very) well. However, recent research has shown that their performance degrades for “diverse” speech, i.e., speech that diverges from “typical” speech due to, among others, demographic and sociolinguistic factors. In this work, given the rapid development of ASR technologies, we examined the performance of nine recently released ASR systems developed by Google, Microsoft, Meta, NVIDIA, and OpenAI, and three custom ASR models trained from scratch, on Dutch diverse speech. Our results showed that although overall recognition results differ quite substantially between the different systems, all systems show similar patterns regarding recognition performance for diverse speaker groups: for most ASR systems and models, language proficiency differences and severe speech motor impairment had a greater impact on performance disparities between speaker groups than demographic or sociolinguistic factors, indicating that acoustic variability due to demographic and sociolinguistic factors is well-represented in “typical speech” training data and consequently is well-modeled in the models. Furthermore, we found that differences in data processing pipelines and decoding setups significantly influenced recognition performance. Importantly, updates to company-developed ASR systems do not always improve performance of or reduce performance disparities between diverse speaker groups.
...
Current state-of-the-art automatic speech recognition (ASR) systems recognize typical speech (very) well. However, recent research has shown that their performance degrades for “diverse” speech, i.e., speech that diverges from “typical” speech due to, among others, demographic and sociolinguistic factors. In this work, given the rapid development of ASR technologies, we examined the performance of nine recently released ASR systems developed by Google, Microsoft, Meta, NVIDIA, and OpenAI, and three custom ASR models trained from scratch, on Dutch diverse speech. Our results showed that although overall recognition results differ quite substantially between the different systems, all systems show similar patterns regarding recognition performance for diverse speaker groups: for most ASR systems and models, language proficiency differences and severe speech motor impairment had a greater impact on performance disparities between speaker groups than demographic or sociolinguistic factors, indicating that acoustic variability due to demographic and sociolinguistic factors is well-represented in “typical speech” training data and consequently is well-modeled in the models. Furthermore, we found that differences in data processing pipelines and decoding setups significantly influenced recognition performance. Importantly, updates to company-developed ASR systems do not always improve performance of or reduce performance disparities between diverse speaker groups.
Bachelor thesis
(2024)
-
T.J. De Valck, O.E. Scharenborg, Y. Zhang, C.R.M.M. Oertel Genannt Bierbach
Automatic Speech Recognition (ASR) systems are found in many places and are used by many people. Some groups of people, superficially older Dutch adults, are recognized less well by these systems. Given the aging population of the Netherlands, it would be beneficial to have ASR systems be more inclusive to allow for more independence of the older adults. By conducting tests on the ASR systems of Google and Microsoft, making use of the JASMIN dataset, I compared the two using word-error-rate (WER), word-information-lost (WIL) and character-error-rate (CER). Results show Microsoft outperforming Google with an average word error rate of 19.6\% compared to 27.35\%. However, Google is less biased on the topics of gender and age. Microsoft was slightly less biased in regards to region, but only by a small margin. Overall, the most notable findings from both systems are a small bias toward female speakers, and a strong bias against speakers from the southern regions of the Netherlands. These findings highlight the need for more inclusive ASR systems, enhancing the independence of older adults.
...
Automatic Speech Recognition (ASR) systems are found in many places and are used by many people. Some groups of people, superficially older Dutch adults, are recognized less well by these systems. Given the aging population of the Netherlands, it would be beneficial to have ASR systems be more inclusive to allow for more independence of the older adults. By conducting tests on the ASR systems of Google and Microsoft, making use of the JASMIN dataset, I compared the two using word-error-rate (WER), word-information-lost (WIL) and character-error-rate (CER). Results show Microsoft outperforming Google with an average word error rate of 19.6\% compared to 27.35\%. However, Google is less biased on the topics of gender and age. Microsoft was slightly less biased in regards to region, but only by a small margin. Overall, the most notable findings from both systems are a small bias toward female speakers, and a strong bias against speakers from the southern regions of the Netherlands. These findings highlight the need for more inclusive ASR systems, enhancing the independence of older adults.