A Case for Deep Learning in Mining Software Repositories

More Info
expand_more

Abstract

Repository mining researchers have successfully applied machine learning in a variety of
scenarios.  However, the use of deep learning in repository mining tasks is still in its infancy.
In this thesis, we describe the advantages and disadvantages of using deep learning in mining software repository research and demonstrate these by doing two case studies on pull requests.
In the first, we train neural models to predict, on arrival, whether a pull request is going to be merged or not.
In the second, we train neural models to answer the question: given two pull requests, are these similar?
We show that using neural models, researchers are able to avoid feature engineering, because these models can be trained on raw data.
Furthermore, neural models have the potential to outperform
traditional supervised machine learning models, due to being able to learn relevant features by themselves.
However, the power of neural models comes at a cost: optimizing the parameters of neural models and explaining neural models is difficult and training them is costly.
We, therefore, recommend researchers to take into account well performing neural architectures in other domains, such as natural language processing, before creating novel architectures.
Furthermore, it is therefore important to include a less costly baseline when using neural models in research, to show that the power and thereby the cost of neural models is justified.