A. Ibrahim

Bachelor thesis (1)

Master thesis (1)

2 records found

Generative AI: Investigating Consistency and Neutrality in Multilingual Outputs

Master thesis (2025) - A. Ibrahim (author) , Luciano Cavalcante Siebert (mentor) , S.K. Kuilman (mentor) , Maria S. Pera (graduation committee member)

This thesis investigates whether large language models (LLMs) produce consistent and neutral outputs when the same prompts are given in English and Arabic. It begins by reviewing technological, philosophical, psychological, and linguistic factors that can influence the behavior of the multilingual model. Consistency is defined as stability in content and tone, while neutrality refers to the absence of biased or emotionally loaded framing.

Ten prompts (seven sensitive and three non-sensitive) were refined through an iterative English ablation process and then translated into Arabic. Six leading LLMs were queried in both languages, and their outputs were analyzed using automated sentiment analysis to measure differences in emotional tone. In parallel, a survey of bilingual English and Arabic speakers evaluated model responses on sentiment consistency, factual consistency, and perceived neutrality in each language, along with the neutral framing of the prompts.

Results indicate that non-sensitive prompts are rated as less neutral but exhibit fewer inconsistencies in sentiment and factuality across English and Arabic outputs. In contrast, sensitive prompts are perceived as more neutral overall but exhibit larger differences in both sentiment and factual alignment. Among the models tested, some demonstrate higher consistency across languages than others. Automated analysis shows English outputs often carry more positive or mixed tones, while Arabic outputs lean toward neutrality. Human evaluations mirror these patterns for non-sensitive topics but differ for the more politically charged prompts, highlighting that automated tools do not align well with human perception in sensitive contexts.

These findings underscore the importance of combining automated metrics with human judgment to assess multilingual reliability and neutrality. The study suggests that improving balance in training data, improving transparency about language-specific behaviors, and guiding users to anticipate multilingual variations are key to developing fairer and more reliable GenAI systems.

Investigating Data Collection and Reporting Practices of Human Annotations in Societally Impactful Machine Learning Applications

A Systematic Review of Top-Cited IEEE Access Papers

Bachelor thesis (2023) - A. Ibrahim (author) , CCS Liem (mentor) , Andrew Demetriou (mentor) , F. Broz (graduation committee member)

This systematic review investigates the practices and implications of human annotations in machine learning (ML) research. Analyzing a selection of 100 papers from the IEEE Access Journal, the study explores the data collection and reporting methods employed. The findings reveal ...