Analyzing the Impact of Self-Admitted Technical Debt on the Code Completion Performance of Large Language Models
L.C. Witte (TU Delft - Electrical Engineering, Mathematics and Computer Science)
A Van Deursen – Mentor (TU Delft - Software Engineering)
Maliheh Izadi – Mentor (TU Delft - Software Engineering)
J. Katzy – Mentor (TU Delft - Software Engineering)
R.M. Popescu – Mentor (TU Delft - Software Engineering)
A. Anand – Graduation committee member (TU Delft - Web Information Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Large Language Models (LLMs) are increasingly integrated into development workflows for tasks such as code completion, bug fixing, and refactoring. While prior work has shown that removing low-quality data—including data smells like Self-Admitted Technical Debt (SATD)—from training data can improve model performance, the isolated effect of SATD at inference time remains unclear.
This study investigates the impact of SATD on LLM performance during code completion. Using The Heap dataset, we annotate over 5 million Java files with SATD bitmasks and construct a set of input–target pairs based on varying SATD contexts and masking strategies. Three code generation models, SmolLM2, StarCoder2, and Mellum, are evaluated on both comment and method generation tasks using standard text-based metrics and manual semantic classification.
Our results show that the presence of SATD in input has a negligible effect on generation quality. Instead, performance is primarily driven by target method length, structural complexity, and context size. We also find that metrics may misrepresent semantic correctness in the presence of non-functional elements such as comments. These findings suggest that careful control of target complexity is more critical than the presence of SATD alone when evaluating LLM performance on code.