Formant-based vowel categorization for cross-lingual phone recognition
More Info
expand_more
Abstract
Multilingual phone recognition models can learn language-independent pronunciation patterns from large volumes of spoken data and recognize them across languages. This potential can be harnessed to improve speech technologies for underresourced languages. However, these models are typically trained on phonological representations of speech sounds, which do not necessarily reflect the phonetic realization of speech. A mismatch between a phonological symbol and its phonetic realizations can lead to phone confusions and reduce performance. This work introduces formant-based vowel categorization aimed at improving cross-lingual vowel recognition by uncovering a vowel's phonetic quality from its formant frequencies, and reorganizing the vowel categories in a multilingual speech corpus to increase their consistency across languages. The work investigates vowel categories obtained from a trilingual multi-dialect speech corpus of Danish, Norwegian, and Swedish using three categorization techniques. Cross-lingual phone recognition experiments reveal that uniting vowel categories of different languages into a set of shared formant-based categories improves cross-lingual recognition of the shared vowels, but also interferes with recognition of vowels not present in one or more training languages. Cross-lingual evaluation on regional dialects provides inconclusive results. Nevertheless, improved recognition of individual vowels can translate to improvements in overall phone recognition on languages unseen during training.
Files
File under embargo until 27-09-2025