A Case for Deep Learning in Mining Software Repositories

None, None

A Case for Deep Learning in Mining Software Repositories

Master Thesis (2017)

Author(s)

H.L.D. Nijessen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Gousios Gousios – Mentor

C Hauff – Graduation committee member

Arie van Deursen – Graduation committee member

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Deep learning Mining software repositories Pull requests

To reference this document use:

https://resolver.tudelft.nl/uuid:fc0cf997-4900-435c-b213-00e5828490de

More Info

expand_more

Publication Year

2017

Language

English

Copyright

Graduation Date

10-11-2017

Awarding Institution

Delft University of Technology

Programme

Computer Science | Software Technology

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Repository mining researchers have successfully applied machine learning in a variety of
scenarios. However, the use of deep learning in repository mining tasks is still in its infancy.
In this thesis, we describe the advantages and disadvantages of using deep learning in mining software repository research and demonstrate these by doing two case studies on pull requests.
In the first, we train neural models to predict, on arrival, whether a pull request is going to be merged or not.
In the second, we train neural models to answer the question: given two pull requests, are these similar?
We show that using neural models, researchers are able to avoid feature engineering, because these models can be trained on raw data.
Furthermore, neural models have the potential to outperform
traditional supervised machine learning models, due to being able to learn relevant features by themselves.
However, the power of neural models comes at a cost: optimizing the parameters of neural models and explaining neural models is difficult and training them is costly.
We, therefore, recommend researchers to take into account well performing neural architectures in other domains, such as natural language processing, before creating novel architectures.
Furthermore, it is therefore important to include a less costly baseline when using neural models in research, to show that the power and thereby the cost of neural models is justified.

Files

Thesis_rnijessen_final.pdf

(pdf | 1.53 Mb)

License info not available