Deep Just-in-Time Defect Prediction at Adyen

Master Thesis (2021)
Author(s)

N. van der Laan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Maurício Aniche – Mentor (TU Delft - Software Engineering)

Arie Van Deursen – Graduation committee member (TU Delft - Software Technology)

Sicco Verwer – Graduation committee member (TU Delft - Cyber Security)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Niek van der Laan
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Niek van der Laan
Graduation Date
25-08-2021
Awarding Institution
Delft University of Technology
Programme
['Computer Science | Software Technology']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Finding defects in proposed changes is one of the biggest motivations and expected outcomes of code review, but does not result as often as expected in actually finding defects. Just-in-time (JIT) defect prediction focuses on predicting bug-introducing changes, which can help with efficient allocation of inspection time according to the defect-proneness of the changed software parts. Despite the promising results achieved by DeepJIT and CC2Vec, two deep learning-based JIT defect prediction models, industry-based JIT defect prediction studies have not opted yet to apply deep models. In this work, the goal is to build and evaluate several JIT defect prediction models that can help Adyen developers spot defective changes during code review. To construct a new dataset with a large enough set of labels, we identify four sources of potential bug-fixing commits by analysing Adyen's way of working. We make several practical adaptations to DeepJIT and CC2Vec and compare their performances with three traditional metric-based models when making predictions at both commit-level and file-level. Our results indicate that deep models are able to outperform the metric-based models across all three datasets. All models performed slightly worse when evaluated on Adyen data compared to an open-source setting, but both deep models still achieved respectable performances and significantly outperformed the metric-based models. When evaluated in a real-world setting on bugs manually collected by Adyen developers, DeepJIT performed consistent with earlier findings when evaluated on commit-level, but performances fall on file-level. Lastly, we find that although inclusion of each bug source generally does not lead to worse performance, whether it leads to better performance is dependent on both what type of model is used and at what granularity predictions are made.

Files

License info not available