Evaluating Autonomous Coding Agents for Code Refactoring and Maintainability

A Large-Scale Study of Open-Source Software

Master Thesis (2026)
Author(s)

I. Joshi (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M. Izadi – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

R.M. Popescu – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

B. Özkan – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M.A. Migut – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
22-06-2026
Awarding Institution
Delft University of Technology
Programme
Computer Science, Data Science and Technology
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
18
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The rapid adoption of autonomous coding agents raises a practical question for developers: is agent-authored code maintainable after merge? We present a large-scale empirical study of agent- and human-authored pull requests in open-source GitHub repositories, focusing on refactoring and maintainability. We construct a novel dataset of 4,392,818 agent-authored and 517,880 human-authored pull requests from 863,819 repositories, spanning 10 agents and 4 programming languages: C++, Java, JavaScript, and Python. Using a subset of 321,986 pull requests, we compare refactoring behavior, code smells, and maintainability metrics between agent- and human-authored contributions. We further examine how these outcomes vary across languages, repository popularity, and domains, and track post-merge evolution from 3 days to 2 months after merge to assess whether maintainability-related effects persist over time.

Our results show that agent-authored pull requests refactor less frequently and less diversely than human-authored pull requests, but their refactorings tend to affect larger code regions, especially in less popular repositories. Maintainability outcomes are mixed: agent-modified code is more likely to contain code smells after merge, while median metric changes remain context-dependent and broadly comparable to human-authored code. Longitudinally, agent-modified code shows similar maintainability trends after the early post-merge period, although agent-modified regions are revisited more frequently.

Files

License info not available
warning

File under embargo until 01-01-2027