Evaluating Autonomous Coding Agents for Code Refactoring and Maintainability

None, None

Evaluating Autonomous Coding Agents for Code Refactoring and Maintainability

A Large-Scale Study of Open-Source Software

Master Thesis (2026)

Author(s)

I. Joshi (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M. Izadi – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

R.M. Popescu – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

B. Özkan – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M.A. Migut – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Refactoring Code Quality Empirical Study Pull Requests Agents Open-Source Software Large Language Model Autonomous Coding Agents Software Maintainability

To reference this document use

https://resolver.tudelft.nl/uuid:63f972c1-858d-4387-9468-114d77702ffc

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

22-06-2026

Awarding Institution

Delft University of Technology

Programme

Computer Science, Data Science and Technology

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

18

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The rapid adoption of autonomous coding agents raises a practical question for developers: is agent-authored code maintainable after merge? We present a large-scale empirical study of agent- and human-authored pull requests in open-source GitHub repositories, focusing on refactoring and maintainability. We construct a novel dataset of 4,392,818 agent-authored and 517,880 human-authored pull requests from 863,819 repositories, spanning 10 agents and 4 programming languages: C++, Java, JavaScript, and Python. Using a subset of 321,986 pull requests, we compare refactoring behavior, code smells, and maintainability metrics between agent- and human-authored contributions. We further examine how these outcomes vary across languages, repository popularity, and domains, and track post-merge evolution from 3 days to 2 months after merge to assess whether maintainability-related effects persist over time.

Our results show that agent-authored pull requests refactor less frequently and less diversely than human-authored pull requests, but their refactorings tend to affect larger code regions, especially in less popular repositories. Maintainability outcomes are mixed: agent-modified code is more likely to contain code smells after merge, while median metric changes remain context-dependent and broadly comparable to human-authored code. Longitudinally, agent-modified code shows similar maintainability trends after the early post-merge period, although agent-modified regions are revisited more frequently.

Files

2026_MScThesis_InaeshJoshi_Eva... (pdf)

(pdf | 0 Mb)

License info not available

File under embargo until 01-01-2027