Investigating model performance in language identification

beyond simple error statistics

Journal Article (2023)
Author(s)

Suzy J. Styles (Nanyang Technological University)

Yi Han Victoria Chua (Nanyang Technological University)

Fei Ting Woon (Nanyang Technological University)

Hexin Liu (Nanyang Technological University)

Leibny Paola Garcia Perera (Johns Hopkins University)

Sanjeev Khudanpur (Johns Hopkins University)

Andy W.H. Khong (Nanyang Technological University)

J.H.G. Dauwels (TU Delft - Signal Processing Systems)

Research Group
Signal Processing Systems
Copyright
© 2023 Suzy J. Styles, Victoria Y.H. Chua, Fei Ting Woon, Hexin Liu, Leibny Paola Garcia Perera, Sanjeev Khudanpur, Andy W.H. Khong, J.H.G. Dauwels
DOI related publication
https://doi.org/10.21437/Interspeech.2023-1707
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Suzy J. Styles, Victoria Y.H. Chua, Fei Ting Woon, Hexin Liu, Leibny Paola Garcia Perera, Sanjeev Khudanpur, Andy W.H. Khong, J.H.G. Dauwels
Research Group
Signal Processing Systems
Volume number
2023-August
Pages (from-to)
4129-4133
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Language development experts need tools that can automatically identify languages from fluent, conversational speech and provide reliable estimates of usage rates at the level of an individual recording. However, LID systems are typically evaluated on metrics such as equal error rate and balanced accuracy, applied at the level of an entire speech corpus. These overview metrics do not provide information about model performance at the level of individual speakers, recordings, or units of speech with different linguistic characteristics. Overview statistics may mask systematic errors in model performance for some subsets of the data, and consequently, have worse performance on data derived from some subsets of human speakers, creating a kind of algorithmic bias. Here, we investigate how well a number of LID systems perform on individual recordings and speech units with different linguistic properties in the MERLIon CCS Challenge featuring accented code-switched child-directed speech.

Files

Styles23_interspeech.pdf
(pdf | 2.62 Mb)
License info not available