Investigating model performance in language identification
beyond simple error statistics
Suzy J. Styles (Nanyang Technological University)
Yi Han Victoria Chua (Nanyang Technological University)
Fei Ting Woon (Nanyang Technological University)
Hexin Liu (Nanyang Technological University)
Leibny Paola Garcia Perera (Johns Hopkins University)
Sanjeev Khudanpur (Johns Hopkins University)
Andy W.H. Khong (Nanyang Technological University)
J.H.G. Dauwels (TU Delft - Signal Processing Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Language development experts need tools that can automatically identify languages from fluent, conversational speech and provide reliable estimates of usage rates at the level of an individual recording. However, LID systems are typically evaluated on metrics such as equal error rate and balanced accuracy, applied at the level of an entire speech corpus. These overview metrics do not provide information about model performance at the level of individual speakers, recordings, or units of speech with different linguistic characteristics. Overview statistics may mask systematic errors in model performance for some subsets of the data, and consequently, have worse performance on data derived from some subsets of human speakers, creating a kind of algorithmic bias. Here, we investigate how well a number of LID systems perform on individual recordings and speech units with different linguistic properties in the MERLIon CCS Challenge featuring accented code-switched child-directed speech.