Linguistic Typology as a Predictor of Multilingual Large Language Model Trustworthiness: A Modeling and Correlation Analysis

Master Thesis (2025)
Author(s)

T. Rood (TU Delft - Mechanical Engineering)

Contributor(s)

Holger Caesar – Mentor (TU Delft - Intelligent Vehicles)

Joost C.F. Winter – Graduation committee member (TU Delft - Human-Robot Interaction)

Faculty
Mechanical Engineering
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
20-05-2025
Awarding Institution
Delft University of Technology
Programme
['Mechanical Engineering | Vehicle Engineering | Cognitive Robotics']
Faculty
Mechanical Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Trustworthiness in Multilingual Large Language Mod- els (MLLMs) varies across languages, often explained by differences in pretraining resources. While this associa- tion with pretraining data is well established, we hypoth- esize that typological differences between languages also influence trustworthiness. We evaluate four similarly sized MLLMs (Aya Expanse 8B, Gemma-2 9B, Ministral 7B, and Llama-3.1 8B) on trustworthiness and analyze its correla- tions with typological features such as nominal, word or- der, verbal, negation, clause structure, phonology, token overlap, language family, and script. We find moderate to very strong correlations between specific typological fea- tures and performance, though these vary across feature- criterion pairs and are present only in English language pairs. Modeling these Z-scores confirms the importance of typology, significantly reducing error in five of eight tasks. Nominal, verbal, token overlap and word order emerge as the most consistent correlators with trustworthiness differ- ences. This exploratory work reveals an expansion of estab- lished cross-lingual typological influences and underscores the complexity of multilingual language modeling.

Files

Combinepdf_3_.pdf
(pdf | 0 Mb)
License info not available
warning

File under embargo until 20-05-2027