Exploring the Generation and Detection of Weaknesses in LLM Generated Code
LLMs can not be trusted to produce secure code, but they can detect it
I. Vasiliauskas (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Ali Al-Kaswan – Mentor (TU Delft - Software Engineering)
A Van Deursen – Graduation committee member (TU Delft - Software Engineering)
Maliheh Izadi – Graduation committee member (TU Delft - Software Engineering)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Large Language Models (LLMs) have gained a lot of popularity for code generation in recent years. Developers might use LLM-generated code in projects where the security of software matters. A relevant question is therefore: what is the prevalence of code weaknesses in LLM-generated code, and can we use LLMs to detect them? In this research, we generate prompts based on a taxonomy of code weaknesses and run them on multiple LLMs with varying properties. We evaluate the results on the existence of insecurities both manually and by the LLMs themselves. We can conclude that even when LLMs are not provoked and asked benign realistic requests, they often generate code containing known software weaknesses. We find a correlation between model parameter size and the percentage of secure answers. However, they are exceptionally successful in recognizing these insecurities themselves. Future work should focus on a wider set of models and a larger set of prompts, to get more results on this subject.