Adapting Large Language Models to Domain-Specific Automated Program Repair
A. Ţerna (TU Delft - Electrical Engineering, Mathematics and Computer Science)
A. van Deursen – Graduation committee member (TU Delft - Software Engineering)
M. Izadi – Mentor (TU Delft - Software Engineering)
J. Yang – Graduation committee member (TU Delft - Web Information Systems)
Timur Galimzyanov – Mentor (JetBrains)
Sergey Titov – Mentor (JetBrains)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Automated program repair (APR) is increasingly critical in modern software development, yet language models (LMs) often struggle to capture repository-specific conventions and constraints. Small language models (SLMs) offer a cost-effective and deployable alternative, but their performance depends heavily on high-quality domain-specific supervision. In this work, we introduce a multi-teacher distillation pipeline that generates multi-turn repair trajectories, including both successful fixes and intermediate failures, to construct rich training datasets for method-level APR. We systematically analyze the impact of dataset size, repair diversity, fine-tuning strategies, hyperparameters, and reasoning supervision, aiming to identify efficient and reliable approaches for adapting SLMs to repository-specific repair tasks.
Our experiments demonstrate that parameter-efficient fine-tuning, particularly LoRA with carefully selected adapter ranks, achieves strong performance across reasoning and non-reasoning regimes while maintaining low computational cost. Explicit reasoning supervision is not required for high repair accuracy, but it significantly reduces reasoning trace lengths and inference costs. Dataset diversity and multi-turn trajectories are key to improving generalization and bridging the gap between reasoning and non-reasoning inference. Finally, this study seeks to provide empirical insights into the practical adaptation of SLMs for repository-specific APR, evaluating how strategic choices in dataset design, lightweight fine-tuning approaches, and reasoning supervision influence performance in real-world contexts.