Predicting vulnerable files by using machine learning method

More Info
expand_more

Abstract

Web applications have been gaining increased popularity around the globe, in such a way that a growing number of users are attracted to make use of the functionality and information provided by these applications. While providing solutions to complicated problems in a fast and reliable way is one of the most advantages of using web applications, these platforms can cause adverse effect on user’s life if controlled in unauthorized way by malicious people. A platform with more vulnerabilities are more likely to be attacked. This research is focusing on building a prediction model for detecting vulnerabilities of web applications at eBay. Based on the analysis of important features, we dig deeper to find decisive factors of web application vulnerabilities. Making use of data on GitHub, we extract features related to source code files and developer networks, such as modification frequency, number of involved developers and duration between two commits. By applying machine learning techniques in the field of vulnerability prediction, we are able to provide reasonable suggestions for developers in the beginning phase. This can help develop relative defect-free and well-documented software. In this paper, we will explain the prediction model in detail from the aspects of code complexity, developers' behaviors and their networks. Moreover, according to results of various classifiers, we offer possible causes of vulnerabilities and reasonable suggestions for avoiding vulnerabilities in the future. To conclude, main contributions of this thesis are valuable feature engineering, the machine learning model and applicable suggestions for predicting vulnerabilities effectively at eBay.