Data-Driven Extract Method Recommendations: An Initial Study at ING

Master Thesis (2021)
Author(s)

D. van der Leij (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Maurício Aniche – Mentor (TU Delft - Software Engineering)

Eelco Visser – Graduation committee member (TU Delft - Programming Languages)

Yaping Luo – Graduation committee member (ING Bank)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 David van der Leij
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 David van der Leij
Graduation Date
30-04-2021
Awarding Institution
Delft University of Technology
Programme
Computer Science
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Refactoring is the process of improving the structure of code without changing its functionality. The process is beneficial for software quality but challenges remain for identifying refactoring opportunities. This work employs machine learning to predict the application of the refactoring type Extract Method in an industry setting with the use of code quality metrics. We detect 919 examples in industry code of Extract Method and 986 examples where Extract Method was not applied and compare this to open-source code. We find that feature distributions between industry and open-source code differ, especially in class-level metrics. We train models to predict Extract Method in industry code and find that Random Forests perform best. We find that class-level metrics are most important for the performance of these models. We then investigate whether models trained on an open-source set generalize to an industry setting. We find that, although less performant than a custom fit model, a Logistic Regression type model performs admirably. Afterward, we examine whether these models perform on unseen industry projects by validating on projects excluded from the training set. We find that average performance is decent but lower than when using the whole industry dataset or an open-source dataset for training. Lastly, we conduct a blind user study in which we ask experts to judge predictions made by our best model. We find that experts generally agree with the model's predictions. In the case that experts agree with the model's prediction to apply Extract Method, they do so because of high code complexity. When they agree with the model's prediction not to refactor they most frequently give the reason that the respective methods are already sufficiently understandable.

Files

License info not available