Linguistic Typology as a Predictor of Multilingual Large Language Model Trustworthiness: A Modeling and Correlation Analysis

None, None

Linguistic Typology as a Predictor of Multilingual Large Language Model Trustworthiness: A Modeling and Correlation Analysis

Master Thesis (2025)

Author(s)

T. Rood (TU Delft - Mechanical Engineering)

Contributor(s)

Holger Caesar – Mentor (TU Delft - Mechanical Engineering)

J.C.F. de Winter – Graduation committee member (TU Delft - Mechanical Engineering)

Faculty

Mechanical Engineering

Natural Language Processing (NLP) Evaluation Large Language Models (LLMs) Trustworthiness Multilingual

To reference this document use

https://resolver.tudelft.nl/uuid:40c39a28-0b17-4acc-8c0a-1d3a13fcb680

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

20-05-2025

Awarding Institution

Delft University of Technology

Programme

Mechanical Engineering, Vehicle Engineering, Cognitive Robotics

Faculty

Mechanical Engineering

Downloads counter

216

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Trustworthiness in Multilingual Large Language Mod- els (MLLMs) varies across languages, often explained by differences in pretraining resources. While this associa- tion with pretraining data is well established, we hypoth- esize that typological differences between languages also influence trustworthiness. We evaluate four similarly sized MLLMs (Aya Expanse 8B, Gemma-2 9B, Ministral 7B, and Llama-3.1 8B) on trustworthiness and analyze its correla- tions with typological features such as nominal, word or- der, verbal, negation, clause structure, phonology, token overlap, language family, and script. We find moderate to very strong correlations between specific typological fea- tures and performance, though these vary across feature- criterion pairs and are present only in English language pairs. Modeling these Z-scores confirms the importance of typology, significantly reducing error in five of eight tasks. Nominal, verbal, token overlap and word order emerge as the most consistent correlators with trustworthiness differ- ences. This exploratory work reveals an expansion of estab- lished cross-lingual typological influences and underscores the complexity of multilingual language modeling.

Files

Combinepdf_3_.pdf

(pdf | 0 Mb)

License info not available

File under embargo until 20-05-2027