Red-Teaming Code LLMs for Malware Generation

None, None

Red-Teaming Code LLMs for Malware Generation

Bachelor Thesis (2024)

Author(s)

C. Ionescu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. van Van Deursen – Mentor (TU Delft - Software Engineering)

Maliheh Izadi – Mentor (TU Delft - Software Engineering)

Ali Al-Kaswan – Mentor (TU Delft - Software Engineering)

K. Liang – Graduation committee member (TU Delft - Cyber Security)

Faculty

Electrical Engineering, Mathematics and Computer Science

Large Language Models LLM AI ethics Generative AI impact Malware HHH LLM4Code

To reference this document use:

https://resolver.tudelft.nl/uuid:bb5a6ebe-d5b5-4563-aa29-8ad662a0f732

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

28-06-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Large Language Models (LLMs) are increasingly used in software development, but their potential for misuse in generating harmful code, such as malware, raises significant concerns. We present a red-teaming approach to assess the safety and ethical alignment of LLMs in the context of code generation, in particular how it applies to the generation of malware. By developing a dataset of prompts that are likely to elicit harmful behavior from the LLMs, we aim to provide a valuable resource for benchmarking the harmlessness factor of these models. Using this dataset, we evaluate multiple state-of-the-art open-source LLMs, analyzing factors such as model size, training alignment, and prompt specificity. Our findings show that LLMs vary significantly in their likelihood to generate harmful code, depending on factors like training data, alignment techniques, and prompt specificity. Furthermore, we demonstrate that system prompts could significantly alter the model's response to potentially harmful queries. We also demonstrate the efficacy of using LLMs to evaluate the harmlessness of other LLMs' responses. This research highlights the importance of ongoing development of safety measures to mitigate the risks associated with code-generating LLMs.

Files

CSE3000_Final_Paper_Ciprian.pd... (pdf)

(pdf | 0.209 Mb)

License info not available