Trustworthiness in Multilingual Large Language Mod- els (MLLMs) varies across languages, often explained by differences in pretraining resources. While this associa- tion with pretraining data is well established, we hypoth- esize that typological differences between languages al
...
Trustworthiness in Multilingual Large Language Mod- els (MLLMs) varies across languages, often explained by differences in pretraining resources. While this associa- tion with pretraining data is well established, we hypoth- esize that typological differences between languages also influence trustworthiness. We evaluate four similarly sized MLLMs (Aya Expanse 8B, Gemma-2 9B, Ministral 7B, and Llama-3.1 8B) on trustworthiness and analyze its correla- tions with typological features such as nominal, word or- der, verbal, negation, clause structure, phonology, token overlap, language family, and script. We find moderate to very strong correlations between specific typological fea- tures and performance, though these vary across feature- criterion pairs and are present only in English language pairs. Modeling these Z-scores confirms the importance of typology, significantly reducing error in five of eight tasks. Nominal, verbal, token overlap and word order emerge as the most consistent correlators with trustworthiness differ- ences. This exploratory work reveals an expansion of estab- lished cross-lingual typological influences and underscores the complexity of multilingual language modeling.