Adapting Large Language Models to Domain-Specific Automated Program Repair

None, None

Adapting Large Language Models to Domain-Specific Automated Program Repair

Master Thesis (2026)

Author(s)

A. Ţerna (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. van Deursen – Graduation committee member (TU Delft - Software Engineering)

M. Izadi – Mentor (TU Delft - Software Engineering)

J. Yang – Graduation committee member (TU Delft - Web Information Systems)

Timur Galimzyanov – Mentor (JetBrains)

Sergey Titov – Mentor (JetBrains)

Faculty

Electrical Engineering, Mathematics and Computer Science

Python Large Language Models (LLMs) Fine-tuning Automatic Program Repair

To reference this document use:

https://resolver.tudelft.nl/uuid:2dfef5e9-c2fa-4291-bd67-5f2d960bd2a3

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

27-01-2026

Awarding Institution

Delft University of Technology

Project

IN5000

Programme

Computer Science

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

73

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Automated program repair (APR) is increasingly critical in modern software development, yet language models (LMs) often struggle to capture repository-specific conventions and constraints. Small language models (SLMs) offer a cost-effective and deployable alternative, but their performance depends heavily on high-quality domain-specific supervision. In this work, we introduce a multi-teacher distillation pipeline that generates multi-turn repair trajectories, including both successful fixes and intermediate failures, to construct rich training datasets for method-level APR. We systematically analyze the impact of dataset size, repair diversity, fine-tuning strategies, hyperparameters, and reasoning supervision, aiming to identify efficient and reliable approaches for adapting SLMs to repository-specific repair tasks.

Our experiments demonstrate that parameter-efficient fine-tuning, particularly LoRA with carefully selected adapter ranks, achieves strong performance across reasoning and non-reasoning regimes while maintaining low computational cost. Explicit reasoning supervision is not required for high repair accuracy, but it significantly reduces reasoning trace lengths and inference costs. Dataset diversity and multi-turn trajectories are key to improving generalization and bridging the gap between reasoning and non-reasoning inference. Finally, this study seeks to provide empirical insights into the practical adaptation of SLMs for repository-specific APR, evaluating how strategic choices in dataset design, lightweight fine-tuning approaches, and reasoning supervision influence performance in real-world contexts.

Files

2025_MSC_thesis_Ana_Terna_1_.p... (pdf)

(pdf | 18.5 Mb)

License info not available