Predicting vulnerable files by using machine learning method

Master thesis (2018)

Authors

X. Shen Electrical Engineering, Mathematics and Computer Science

Contributors

S.E. Verwer (mentor)

P.H. Hartel (graduation committee member)

Maurício Aniche (graduation committee member)

Saeed Sedghi (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:899729ed-9b81-4973-a46a-18eca3131c8a

More Info

expand_more

Published Date

27-09-2018

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Web applications have been gaining increased popularity around the globe, in such a way that a growing number of users are attracted to make use of the functionality and information provided by these applications. While providing solutions to complicated problems in a fast and reliable way is one of the most advantages of using web applications, these platforms can cause adverse effect on user’s life if controlled in unauthorized way by malicious people. A platform with more vulnerabilities are more likely to be attacked. This research is focusing on building a prediction model for detecting vulnerabilities of web applications at eBay. Based on the analysis of important features, we dig deeper to find decisive factors of web application vulnerabilities. Making use of data on GitHub, we extract features related to source code files and developer networks, such as modification frequency, number of involved developers and duration between two commits. By applying machine learning techniques in the field of vulnerability prediction, we are able to provide reasonable suggestions for developers in the beginning phase. This can help develop relative defect-free and well-documented software. In this paper, we will explain the prediction model in detail from the aspects of code complexity, developers' behaviors and their networks. Moreover, according to results of various classifiers, we offer possible causes of vulnerabilities and reasonable suggestions for avoiding vulnerabilities in the future. To conclude, main contributions of this thesis are valuable feature engineering, the machine learning model and applicable suggestions for predicting vulnerabilities effectively at eBay.

Files

Graduation_Thesis.pdf

(.pdf | 3.41 Mb)