Adapting Large Language Models to Domain-Specific Automated Program Repair

Master Thesis (2026)
Author(s)

A. Ţerna (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. van Deursen – Graduation committee member (TU Delft - Software Engineering)

M. Izadi – Mentor (TU Delft - Software Engineering)

J. Yang – Graduation committee member (TU Delft - Web Information Systems)

Timur Galimzyanov – Mentor (JetBrains)

Sergey Titov – Mentor (JetBrains)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
27-01-2026
Awarding Institution
Delft University of Technology
Project
IN5000
Programme
Computer Science
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
73
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Automated program repair (APR) is increasingly critical in modern software development, yet language models (LMs) often struggle to capture repository-specific conventions and constraints. Small language models (SLMs) offer a cost-effective and deployable alternative, but their performance depends heavily on high-quality domain-specific supervision. In this work, we introduce a multi-teacher distillation pipeline that generates multi-turn repair trajectories, including both successful fixes and intermediate failures, to construct rich training datasets for method-level APR. We systematically analyze the impact of dataset size, repair diversity, fine-tuning strategies, hyperparameters, and reasoning supervision, aiming to identify efficient and reliable approaches for adapting SLMs to repository-specific repair tasks.

Our experiments demonstrate that parameter-efficient fine-tuning, particularly LoRA with carefully selected adapter ranks, achieves strong performance across reasoning and non-reasoning regimes while maintaining low computational cost. Explicit reasoning supervision is not required for high repair accuracy, but it significantly reduces reasoning trace lengths and inference costs. Dataset diversity and multi-turn trajectories are key to improving generalization and bridging the gap between reasoning and non-reasoning inference. Finally, this study seeks to provide empirical insights into the practical adaptation of SLMs for repository-specific APR, evaluating how strategic choices in dataset design, lightweight fine-tuning approaches, and reasoning supervision influence performance in real-world contexts.

Files

License info not available