Linguistic Typology as a Predictor of Multilingual Large Language Model Trustworthiness: A Modeling and Correlation Analysis
T. Rood (TU Delft - Mechanical Engineering)
Holger Caesar – Mentor (TU Delft - Intelligent Vehicles)
Joost C.F. Winter – Graduation committee member (TU Delft - Human-Robot Interaction)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Trustworthiness in Multilingual Large Language Mod- els (MLLMs) varies across languages, often explained by differences in pretraining resources. While this associa- tion with pretraining data is well established, we hypoth- esize that typological differences between languages also influence trustworthiness. We evaluate four similarly sized MLLMs (Aya Expanse 8B, Gemma-2 9B, Ministral 7B, and Llama-3.1 8B) on trustworthiness and analyze its correla- tions with typological features such as nominal, word or- der, verbal, negation, clause structure, phonology, token overlap, language family, and script. We find moderate to very strong correlations between specific typological fea- tures and performance, though these vary across feature- criterion pairs and are present only in English language pairs. Modeling these Z-scores confirms the importance of typology, significantly reducing error in five of eight tasks. Nominal, verbal, token overlap and word order emerge as the most consistent correlators with trustworthiness differ- ences. This exploratory work reveals an expansion of estab- lished cross-lingual typological influences and underscores the complexity of multilingual language modeling.
Files
File under embargo until 20-05-2027