M. Izadi | TU Delft Repository

Enhancing Issue Tracking Efficiency with AI-Driven Natural Language Processing: Improving Classification, Association and Resolution

Master thesis (2025) - V.A. Pocheva (author) , N. Yorke-Smith (mentor) , Maliheh Izadi (mentor) , René van den Berg (mentor) , Andreea Costea (graduation committee member) , Diomidis Spinellis (mentor)

In large-scale engineering environments, efficient issue tracking is essential for timely problem resolution and knowledge reuse. However, manual classification and association of issue reports present scalability challenges, further complicated by inconsistent annotations and th ...

Gen-AI Meets Domain Expertise: LLMs for Domain Specific Code Generation

A study conducted at the ASML leveling department

Master thesis (2025) - Y. Mundhra (author) , Maliheh Izadi (mentor) , F.A. Kuipers (mentor) , Max Valk (mentor) , Lewis Binns (mentor) , U.K. Gadiraju (graduation committee member) , Goran Brkic (mentor)

Large Language Models (LLMs) have shown impressive performance in various domains, including software engineering. Code generation, a crucial aspect of software development, has seen significant improvements with the integration of AI tools. While existing LLMs have show very goo ...

Analyzing the Impact of Self-Admitted Technical Debt on the Code Completion Performance of Large Language Models

Bachelor thesis (2025) - L.C. Witte (author) , Arie van Deursen (mentor) , Maliheh Izadi (mentor) , Jonathan Katzy (mentor) , R. M. Popescu (mentor) , Avishek Anand (graduation committee member)

Large Language Models (LLMs) are increasingly integrated into development workflows for tasks such as code completion, bug fixing, and refactoring. While prior work has shown that removing low-quality data—including data smells like Self-Admitted Technical Debt (SATD)—from traini ...

Data Hound: Linking Educational Value to LLM Code Completion Performance During Inference

Bachelor thesis (2025) - B.R.M. Annink (author) , Arie van Deursen (mentor) , Maliheh Izadi (mentor) , Jonathan Katzy (mentor) , R. M. Popescu (mentor) , Avishek Anand (graduation committee member)

This paper investigates the relation between the educational value of input code and the subsequent inference performance of code large language models (LLMs) on completion tasks. Results were attained using The Heap dataset and using SmolLM2, StarCoder 2 and Mellum models. Perfo ...

Data Hound: Analyzing Boilerplate Code Data Smell on Large Code Datasets

Bachelor thesis (2025) - S.A. Minkov (author) , A. van Van Deursen (mentor) , Maliheh Izadi (mentor) , J. Katzy (mentor) , R.M. Popescu (mentor)

As Large Language Models become an ever more integral part of Software Engineering, often assisting developers on coding tasks, the need for an unbiased evaluation of their performance on such tasks grows [1]. Data smells [2] are reported to have an impact on a Large Language Mod ...

Dataset Development for LLMs4Code: Licensing, Contamination, and Reproducibility Challenges

Master thesis (2025) - R. Popescu (author) , Arie Deursen (mentor) , Maliheh Izadi (mentor) , J. Yang (graduation committee member)

The rapid rise in the popularity of large language models has highlighted the need for extensive datasets, especially for training on code. However, this growth has also raised important questions about the legal implications of using code in large language model training, partic ...

The rapid rise in the popularity of large language models has highlighted the need for extensive datasets, especially for training on code. However, this growth has also raised important questions about the legal implications of using code in large language model training, particularly regarding the potential infringement of code licenses. At the same time, the availability of clean datasets for evaluating these models is becoming increasingly limited, due to a high risk of contamination which restricts the capacity for reliable research. On top of that, this requires researchers to repeatedly perform data curation steps in order to evaluate their models on downstream tasks, based on previously unseen data. This process is not only time- and resource-intensive but also introduces potential inconsistencies across studies, which can impact their reproducibility.
We address these challenges through a comprehensive licensing analysis and by developing robust datasets to support accurate and reproducible large language model evaluations. We compiled a list of 53 large language models trained on file-level code and analyzed their datasets, discovering pervasive license inconsistencies despite careful selection based on repository licenses. Our analysis, covering 514M code files, reveals 38M exact duplicates of strong copyleft code, and 171M file-leading comments, 16M of which are under copyleft licenses and another 11M discouraging unauthorized copying. To further understand the depth of non-permissive code in public training datasets, we developed StackLessV2, a strong copyleft Java dataset decontaminated against The Stack V2 to facilitate accurate model evaluations. Our results revealed that non-permissive code is also present at the near-duplication level, although, this represents a gray area in terms of legal interpretation, where the boundary between acceptable reuse and license violation is still unclear, emphasizing the need for further legal clarification. Finally, we extend on this and introduce The Heap, a large multilingual copyleft dataset covering 57 programming languages, specifically deduplicated to avoid contamination from existing open training datasets. The Heap offers a solution for conducting fair, reproducible evaluations of large language models without the significant overhead of the data curation process.

Black-box context-aware code completion

Enhancing consumer-facing code completion with low-cost general enhancements

Master thesis (2024) - T.O. van Dam (author) , Maliheh Izadi (mentor) , Arie van Deursen (mentor) , Egor Bogomolov (mentor) , J Yang (graduation committee member)

Interactive & Adaptive LLMs

Building and evaluating an LLM-based code completion plugin for JetBrains IDEs

Master thesis (2024) - F.N.M. van der Heijden (author) , A. van Van Deursen (mentor) , Maliheh Izadi (mentor) , U.K. Gadiraju (graduation committee member) , S. Titov (mentor) , A. Sergeyuk (mentor)

AI for Software Engineering: Reviewing and Improving Benchmarking Practices

Master thesis (2024) - P.M. de Bekker (author) , M. Izadi (mentor) , Arie van Deursen (mentor) , M.S. Pera (graduation committee member)

Artificial Intelligence (AI) has rapidly advanced, significantly impacting software engineering through AI-driven tools like ChatGPT and Copilot. These tools, which have garnered substantial commercial interest, rely heavily on the performance of their underlying models, assessed ...

Implications of LLMs4Code on Copyright Infringement

An Exploratory Study Through Red Teaming

Bachelor thesis (2024) - B. Koc (author) , Ali Al-Kaswan (mentor) , Maliheh Izadi (mentor) , Arie van Deursen (mentor) , Kaitai Liang (graduation committee member)

Large Language Models (LLMs) have experienced a rapid increase in usage across numerous sectors in recent years. However, this growth brings a greater risk of misuse. This paper explores the issue of copyright infringement facilitated by LLMs in the domain of software engineering ...

Red Teaming Large Language Models for Code

Exploring Dangerous and Unfair Software Applications

Bachelor thesis (2024) - P.S. Deatc (author) , Arie van Deursen (mentor) , Maliheh Izadi (mentor) , Ali Al-Kaswan (mentor) , Kaitai Liang (graduation committee member)

The rapid advancement of large language models has enabled numerous innovative, but also harmful applications. It is therefore essential to create these models to behave safely and responsibly. One way to improve these models is by red teaming them. In this study, we aim to ident ...

Red-Teaming Code LLMs for Malware Generation

Bachelor thesis (2024) - C. Ionescu (author) , Arie van Deursen (mentor) , Maliheh Izadi (mentor) , Ali Al-Kaswan (mentor) , Kaitai Liang (graduation committee member)

Large Language Models (LLMs) are increasingly used in software development, but their potential for misuse in generating harmful code, such as malware, raises significant concerns. We present a red-teaming approach to assess the safety and ethical alignment of LLMs in the context ...

Tokenization Matters: Training your Tokenizer Right

Testing the Impact of Tokenization on Language Modelling with (Small) Transfomers

Bachelor thesis (2024) - R. Braga Medeiros Mota Borges (author) , M. Izadi (mentor) , A.D. de Moor (mentor) , Arie Van Van Deursen (mentor) , Thomas Abeel (graduation committee member)

Large language models (LLMs) are rapidly increasing in parameter count, but this growth is not matched by an availability of high-quality data. This discrepancy raises concerns about the sustain- ability of current approaches to language model improvement, especially as forecasts ...

Evaluating Adaptive Activation Functions in Language Models

Does choice of activation function matter in smaller Langaunge Models?

Bachelor thesis (2024) - F. Ignijic (author) , M. Izadi (mentor) , Arie Van Van Deursen (mentor) , Aral de Moor (mentor) , Thomas Abeel (graduation committee member)

The rapid expansion of large language models (LLMs) driven by the transformer architecture has raised concerns about the lack of high-quality train ing data. This study investigates the role of acti vation functions in smaller-scale language models, specifically those with app ...

Sparse Transformers are (in)Efficient Learners

Comparing Sparse Feedforward Layers in Small Transformers

Bachelor thesis (2024) - Y. Wu (author) , Arie Van Van Deursen (mentor) , M. Izadi (mentor) , Aral de Moor (mentor) , Thomas Abeel (graduation committee member)

Although transformers are state-of-the-art models for natural language tasks, obtaining reasonable performance still often requires large transformers which are expensive to train and deploy. Fortunately, there are techniques to increase the size of transformers without extra com ...

LLM of Babel

An analysis of the behavior of large language models when performing Java code summarization in Dutch

Bachelor thesis (2024) - G.G.S. Panchu (author) , J. Katzy (mentor) , Maliheh Izadi (mentor) , A. van Van Deursen (mentor) , M.A. Migut (graduation committee member)

How well do large language models (LLMs) infer text in a non-English context when performing code summarization? The goal of this paper was to understand the mistakes made by LLMs when performing code summarization in Dutch. We categorized the mistakes made by CodeQwen1.5-7b when ...

LLM of Babel: Evaluation of LLMs on code for non-English use-cases

Bachelor thesis (2024) - P. Loizides (author) , J. Katzy (mentor) , Maliheh Izadi (mentor) , A. van Van Deursen (mentor) , M.A. Migut (graduation committee member)

This paper evaluates the performance of Large Language Models, specifically StarCoder 2, in non-English code summarization, with a focus on the Greek language. We establish a hierarchical error taxonomy through an open coding approach to enhance the understanding and improvement ...

LLM of Babel: Evaluation of LLMs on code for non-English use-cases

Bachelor thesis (2024) - M. Ziemlewski (author) , J. Katzy (mentor) , A. van Van Deursen (mentor) , Maliheh Izadi (mentor) , M.A. Migut (graduation committee member)

This research evaluates the performance of Meta's Code Llama 7B model in generating comments for Java code written in Polish. Using a mixed-methods approach, we conduct both quantitative and qualitative methods to discover the model's accuracy and limitations. We preprocess a dat ...

Evaluating CodeGemma-7B for Dutch Code Comment Generation

Bachelor thesis (2024) - S.R. Vermeulen (author) , Maliheh Izadi (mentor) , A. van Van Deursen (mentor) , J. Katzy (mentor) , M.A. Migut (graduation committee member)

Interest in Large Language Models is growing, especially in software development tasks such as code completion and comment generation. However, most Large Language Models are primarily trained on English language data, raising concerns about their effectiveness when applied to ot ...

LLM of Babel: Evaluation of LLMs on code for non-English use-cases

Bachelor thesis (2024) - Y. Huang (author) , A. van Van Deursen (mentor) , Maliheh Izadi (mentor) , J. Katzy (mentor) , M.A. Migut (graduation committee member)

After the emergence of BERT, Large Language Models (LLMs) have demonstrated remarkable multilingual capabilities and have seen widespread adoption globally, particularly in the field of programming. However, current evaluations and benchmarks of LLMs on code primarily focus on En ...