Analyzing the Impact of Self-Admitted Technical Debt on the Code Completion Performance of Large Language Models

None, None

Analyzing the Impact of Self-Admitted Technical Debt on the Code Completion Performance of Large Language Models

Bachelor Thesis (2025)

Author(s)

L.C. Witte (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A Van Deursen – Mentor (TU Delft - Software Engineering)

Maliheh Izadi – Mentor (TU Delft - Software Engineering)

J. Katzy – Mentor (TU Delft - Software Engineering)

R.M. Popescu – Mentor (TU Delft - Software Engineering)

A. Anand – Graduation committee member (TU Delft - Web Information Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Code generation Data smells Large language models, SATD

To reference this document use:

https://resolver.tudelft.nl/uuid:72b68f89-197e-49fd-a254-af47ff7b4e4f

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

01-07-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Large Language Models (LLMs) are increasingly integrated into development workflows for tasks such as code completion, bug fixing, and refactoring. While prior work has shown that removing low-quality data—including data smells like Self-Admitted Technical Debt (SATD)—from training data can improve model performance, the isolated effect of SATD at inference time remains unclear.

This study investigates the impact of SATD on LLM performance during code completion. Using The Heap dataset, we annotate over 5 million Java files with SATD bitmasks and construct a set of input–target pairs based on varying SATD contexts and masking strategies. Three code generation models, SmolLM2, StarCoder2, and Mellum, are evaluated on both comment and method generation tasks using standard text-based metrics and manual semantic classification.

Our results show that the presence of SATD in input has a negligible effect on generation quality. Instead, performance is primarily driven by target method length, structural complexity, and context size. We also find that metrics may misrepresent semantic correctness in the presence of non-functional elements such as comments. These findings suggest that careful control of target complexity is more critical than the presence of SATD alone when evaluating LLM performance on code.

Files

RP-final-paper-Lucas.pdf

(pdf | 0.795 Mb)

License info not available