Vulnerability prealerting by monitoring the online repositories of open source projects

More Info
expand_more

Abstract

Software security plays a crucial role in the modern world governed by software. And while closed source projects can enjoy a sense of confidentiality when addressing security issues, open source projects undertake them publicly even though just as many projects rely on them. In 50% of documented cases, the vulnerabilities could have been spotted almost 20 days before their disclosure leaving plenty of time for a potential attacker to exploit the weakness.

Based on the results of a basic text search, we conclude that the majority of security-related activity is in reaction to known vulnerabilities and that maintainers are not always mentioning security terms when fixing exploits. We also confirm that many security-labeled issues are not pushed to vulnerability systems, even though the maintainers realize their security aspect. Then, while commit classification models can spot security-related commits automatically, the models struggle in realistic scenarios, and no particular feature or sampling method is vastly better than the others. Nonetheless, we evaluated the state-of-the-art models which spot security-related commits with an F1 score of 0.36.

Given the findings, we conclude that security-related activity is hard to automatically distinguish from everyday development activity and that manual review is required to spot these traces. Proposed methods can make this review easier. We suggest that more attention should be given to open source security to avoid early public traces of vulnerabilities.