Predicting True Vulnerabilities from Static Analyzer Warnings in Industry

An Attempt to Faster Releasing Software in Industry

Master thesis (2020)

Authors

S.P.D. Bisesser Electrical Engineering, Mathematics and Computer Science

Contributors

A. Panichella Software Engineering - (supervisor 1)

S.E. Verwer Cyber Security - (supervisor 2)

R.L. Lagendijk Cyber Security - (supervisor 2)

Faculty

Electrical Engineering, Mathematics and Computer Science

Code Metrics Classification Supervised Learning Static Analysis Fortify Software vulnerability detection Vulnerability Types Granularity Closed Source

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:abfa9cc8-75ba-4dd0-84ed-3ce674445c0d

Published Date

14-12-2020

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

An increasing digital world, comes with many benefits but unfortunately also many drawbacks. The increase of the digital world means an increase in data and software. Developing more software unfortunately also means a higher probability of vulnerabilities, which can be exploited by adversaries. Adversaries taking advantage of users and software vulnerabilities, by stealing data to cause harm, steal money, etc. This makes the digital world a dangerous environment.
To ensure software has a minimal number of vulnerabilities, companies invest in software tools and experts to check their software for vulnerabilities. One such company is ING, the largest bank of The Netherlands. At ING they use Fortify, a static analyzer. The problem with this tool is that it gives many false positives. Therefore, pentesters and developers have to manually check all the warnings given by Fortify, which takes a lot of time and slows down the whole software development process. In this study, we propose to use supervised machine learning techniques to predict true vulnerabilities from static analyzer warnings. Using ING's data from Fortify, two highly imbalanced datasets with code metrics are created on class and method level. Various classifiers and sampling techniques are compared to determine which techniques perform the best. Next to that, we also compared the performance at different levels of granularity. Finally, we also investigate whether a dataset with different types of vulnerabilities performs better than a dataset consisting of only one vulnerability type. From our study, it is clear that Bagging in combination with ClassBalancer gives the best f-measure (0.618) for the class-level dataset, which is slightly good. Random Forest with SMOTE gives the best f-measure (0.412) for the method-level dataset, which we consider weak. Depending on the type of vulnerability, the performance can benefit from a dataset per vulnerability type. Overall, the performance found in this study shows slightly promising results when using Fortify in combination with supervised machine learning, especially compared to only using Fortify.

Files

Master_Thesis_TU_Delft_Dinesh_... (.pdf)

(.pdf | 3.1 Mb)