B.M. Buzatu

Bachelor thesis (1)

1 records found

Data hound: Analysing non-English data smells in large code datasets

Bachelor thesis (2025) - B.M. Buzatu (author) , Arie van Deursen (graduation committee member) , Maliheh Izadi (graduation committee member) , Jonathan Katzy (mentor) , R. M. Popescu (mentor) , Avishek Anand (graduation committee member)

Large Language Models (LLMs) are increasingly used for code-centric tasks. However, their training data often exhibits data smells that may hinder downstream quality. This research focuses on the “Uneven Natural Languages” smell and the presence of non-English text in source code ...