J.B. Katzy | TU Delft Repository

Data Hound: Linking Educational Value to LLM Code Completion Performance During Inference

Bachelor thesis (2025) - B.R.M. Annink (author) , Arie van Deursen (mentor) , Maliheh Izadi (mentor) , Jonathan Katzy (mentor) , R. M. Popescu (mentor) , Avishek Anand (graduation committee member)

This paper investigates the relation between the educational value of input code and the subsequent inference performance of code large language models (LLMs) on completion tasks. Results were attained using The Heap dataset and using SmolLM2, StarCoder 2 and Mellum models. Perfo ...

Analyzing the Impact of Self-Admitted Technical Debt on the Code Completion Performance of Large Language Models

Bachelor thesis (2025) - L.C. Witte (author) , Arie van Deursen (mentor) , Maliheh Izadi (mentor) , Jonathan Katzy (mentor) , R. M. Popescu (mentor) , Avishek Anand (graduation committee member)

Large Language Models (LLMs) are increasingly integrated into development workflows for tasks such as code completion, bug fixing, and refactoring. While prior work has shown that removing low-quality data—including data smells like Self-Admitted Technical Debt (SATD)—from traini ...

Data Hound: Analyzing Boilerplate Code Data Smell on Large Code Datasets

Bachelor thesis (2025) - S.A. Minkov (author) , A. van Van Deursen (mentor) , Maliheh Izadi (mentor) , J. Katzy (mentor) , R.M. Popescu (mentor)

As Large Language Models become an ever more integral part of Software Engineering, often assisting developers on coding tasks, the need for an unbiased evaluation of their performance on such tasks grows [1]. Data smells [2] are reported to have an impact on a Large Language Mod ...

Data hound: Analysing non-English data smells in large code datasets

Bachelor thesis (2025) - B.M. Buzatu (author) , Arie van Deursen (graduation committee member) , Maliheh Izadi (graduation committee member) , Jonathan Katzy (mentor) , R. M. Popescu (mentor) , Avishek Anand (graduation committee member)

Large Language Models (LLMs) are increasingly used for code-centric tasks. However, their training data often exhibits data smells that may hinder downstream quality. This research focuses on the “Uneven Natural Languages” smell and the presence of non-English text in source code ...

LLM of Babel: Evaluation of LLMs on code for non-English use-cases

Bachelor thesis (2024) - P. Loizides (author) , J. Katzy (mentor) , Maliheh Izadi (mentor) , A. van Van Deursen (mentor) , M.A. Migut (graduation committee member)

This paper evaluates the performance of Large Language Models, specifically StarCoder 2, in non-English code summarization, with a focus on the Greek language. We establish a hierarchical error taxonomy through an open coding approach to enhance the understanding and improvement ...

LLM of Babel: Evaluation of LLMs on code for non-English use-cases

Bachelor thesis (2024) - Y. Huang (author) , A. van Van Deursen (mentor) , Maliheh Izadi (mentor) , J. Katzy (mentor) , M.A. Migut (graduation committee member)

After the emergence of BERT, Large Language Models (LLMs) have demonstrated remarkable multilingual capabilities and have seen widespread adoption globally, particularly in the field of programming. However, current evaluations and benchmarks of LLMs on code primarily focus on En ...

LLM of Babel

An analysis of the behavior of large language models when performing Java code summarization in Dutch

Bachelor thesis (2024) - G.G.S. Panchu (author) , J. Katzy (mentor) , Maliheh Izadi (mentor) , A. van Van Deursen (mentor) , M.A. Migut (graduation committee member)

How well do large language models (LLMs) infer text in a non-English context when performing code summarization? The goal of this paper was to understand the mistakes made by LLMs when performing code summarization in Dutch. We categorized the mistakes made by CodeQwen1.5-7b when ...

Evaluating CodeGemma-7B for Dutch Code Comment Generation

Bachelor thesis (2024) - S.R. Vermeulen (author) , Maliheh Izadi (mentor) , A. van Van Deursen (mentor) , J. Katzy (mentor) , M.A. Migut (graduation committee member)

Interest in Large Language Models is growing, especially in software development tasks such as code completion and comment generation. However, most Large Language Models are primarily trained on English language data, raising concerns about their effectiveness when applied to ot ...

LLM of Babel: Evaluation of LLMs on code for non-English use-cases

Bachelor thesis (2024) - M. Ziemlewski (author) , J. Katzy (mentor) , A. van Van Deursen (mentor) , Maliheh Izadi (mentor) , M.A. Migut (graduation committee member)

This research evaluates the performance of Meta's Code Llama 7B model in generating comments for Java code written in Polish. Using a mixed-methods approach, we conduct both quantitative and qualitative methods to discover the model's accuracy and limitations. We preprocess a dat ...

A Cross-Lingual Evaluation of CodeGen's Performance in Code Completion

Bachelor thesis (2023) - M.L. Keeler (author) , Arie Van Deursen (mentor) , Azqa Nadeem (graduation committee member) , Maliheh Izadi (mentor) , J.B. Katzy (mentor)

We present an investigation into the relationship between the average depth of the first correct prediction and the performance of CodeGen. This was done on a dataset comprised of code files comprised of C++, Go, Java, Julia, Kotlin, and Python. The analysis involved investigatin ...

Cross-lingual Performance of CodeGPT on the Code Completion Task

Bachelor thesis (2023) - H.N. Kuo (author) , Maliheh Izadi (mentor) , J. Katzy (mentor) , A. van Van Deursen (mentor) , A. Nadeem (graduation committee member)

The development of contemporary source code auto-completion tools have significantly boosted productivity and efficiency of developers. In 2021, the GPT-2-based Transformer CodeGPT was developed to support code completion and text-to-code generation. Similarly to most code model ...

Evaluating Large Language Model Performance on User and Language Defined Elements in Code

Bachelor thesis (2023) - E.J. Mekkes (author) , A. van Van Deursen (mentor) , Maliheh Izadi (mentor) , J. Katzy (mentor) , A. Nadeem (graduation committee member)

Large Language Models of code have seen significant jumps in performance recently. However, these jumps tend to accompany a notable and perhaps concerning increase in scale and costs. We contribute an evaluation of prediction performance with respect to model size by assessing th ...

A Study on the Impact of Common Code Structures on CodeParrot’s Autocompletion Performance

Bachelor thesis (2023) - R. Popescu (author) , Maliheh Izadi (mentor) , J. Katzy (mentor) , A. van Van Deursen (mentor) , A. Nadeem (graduation committee member)

In recent years, deep learning techniques, particularly transformer models, have demonstrated remarkable advancements in the accuracy and efficiency of language models. These models provide the foundation for many natural language processing tasks, including code completion. The ...