IJ
I. Joshi
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
1 records found
1
Evaluating Autonomous Coding Agents for Code Refactoring and Maintainability
A Large-Scale Study of Open-Source Software
The rapid adoption of autonomous coding agents raises a practical question for developers: is agent-authored code maintainable after merge? We present a large-scale empirical study of agent- and human-authored pull requests in open-source GitHub repositories, focusing on refactoring and maintainability. We construct a novel dataset of 4,392,818 agent-authored and 517,880 human-authored pull requests from 863,819 repositories, spanning 10 agents and 4 programming languages: C++, Java, JavaScript, and Python. Using a subset of 321,986 pull requests, we compare refactoring behavior, code smells, and maintainability metrics between agent- and human-authored contributions. We further examine how these outcomes vary across languages, repository popularity, and domains, and track post-merge evolution from 3 days to 2 months after merge to assess whether maintainability-related effects persist over time.
Our results show that agent-authored pull requests refactor less frequently and less diversely than human-authored pull requests, but their refactorings tend to affect larger code regions, especially in less popular repositories. Maintainability outcomes are mixed: agent-modified code is more likely to contain code smells after merge, while median metric changes remain context-dependent and broadly comparable to human-authored code. Longitudinally, agent-modified code shows similar maintainability trends after the early post-merge period, although agent-modified regions are revisited more frequently. ...
Our results show that agent-authored pull requests refactor less frequently and less diversely than human-authored pull requests, but their refactorings tend to affect larger code regions, especially in less popular repositories. Maintainability outcomes are mixed: agent-modified code is more likely to contain code smells after merge, while median metric changes remain context-dependent and broadly comparable to human-authored code. Longitudinally, agent-modified code shows similar maintainability trends after the early post-merge period, although agent-modified regions are revisited more frequently. ...
The rapid adoption of autonomous coding agents raises a practical question for developers: is agent-authored code maintainable after merge? We present a large-scale empirical study of agent- and human-authored pull requests in open-source GitHub repositories, focusing on refactoring and maintainability. We construct a novel dataset of 4,392,818 agent-authored and 517,880 human-authored pull requests from 863,819 repositories, spanning 10 agents and 4 programming languages: C++, Java, JavaScript, and Python. Using a subset of 321,986 pull requests, we compare refactoring behavior, code smells, and maintainability metrics between agent- and human-authored contributions. We further examine how these outcomes vary across languages, repository popularity, and domains, and track post-merge evolution from 3 days to 2 months after merge to assess whether maintainability-related effects persist over time.
Our results show that agent-authored pull requests refactor less frequently and less diversely than human-authored pull requests, but their refactorings tend to affect larger code regions, especially in less popular repositories. Maintainability outcomes are mixed: agent-modified code is more likely to contain code smells after merge, while median metric changes remain context-dependent and broadly comparable to human-authored code. Longitudinally, agent-modified code shows similar maintainability trends after the early post-merge period, although agent-modified regions are revisited more frequently.
Our results show that agent-authored pull requests refactor less frequently and less diversely than human-authored pull requests, but their refactorings tend to affect larger code regions, especially in less popular repositories. Maintainability outcomes are mixed: agent-modified code is more likely to contain code smells after merge, while median metric changes remain context-dependent and broadly comparable to human-authored code. Longitudinally, agent-modified code shows similar maintainability trends after the early post-merge period, although agent-modified regions are revisited more frequently.