Red-Teaming Code LLMs for Malware Generation

Bachelor Thesis (2024)
Author(s)

C. Ionescu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. van Van Deursen – Mentor (TU Delft - Software Engineering)

Maliheh Izadi – Mentor (TU Delft - Software Engineering)

Ali Al-Kaswan – Mentor (TU Delft - Software Engineering)

K. Liang – Graduation committee member (TU Delft - Cyber Security)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
28-06-2024
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Large Language Models (LLMs) are increasingly used in software development, but their potential for misuse in generating harmful code, such as malware, raises significant concerns. We present a red-teaming approach to assess the safety and ethical alignment of LLMs in the context of code generation, in particular how it applies to the generation of malware. By developing a dataset of prompts that are likely to elicit harmful behavior from the LLMs, we aim to provide a valuable resource for benchmarking the harmlessness factor of these models. Using this dataset, we evaluate multiple state-of-the-art open-source LLMs, analyzing factors such as model size, training alignment, and prompt specificity. Our findings show that LLMs vary significantly in their likelihood to generate harmful code, depending on factors like training data, alignment techniques, and prompt specificity. Furthermore, we demonstrate that system prompts could significantly alter the model's response to potentially harmful queries. We also demonstrate the efficacy of using LLMs to evaluate the harmlessness of other LLMs' responses. This research highlights the importance of ongoing development of safety measures to mitigate the risks associated with code-generating LLMs.

Files

License info not available