Implications of LLMs4Code on Copyright Infringement

An Exploratory Study Through Red Teaming

Bachelor Thesis (2024)
Author(s)

B. Koc (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Ali Al-Kaswan – Mentor (TU Delft - Software Engineering)

Maliheh Izadi – Mentor (TU Delft - Software Engineering)

A. van Van Deursen – Mentor (TU Delft - Software Engineering)

K. Liang – Graduation committee member (TU Delft - Cyber Security)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
28-06-2024
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Large Language Models (LLMs) have experienced a rapid increase in usage across numerous sectors in recent years. However, this growth brings a greater risk of misuse. This paper explores the issue of copyright infringement facilitated by LLMs in the domain of software engineering. Through the creation of a taxonomy and prompt engineering, we investigate how alignment, structure and language of prompts affect the behavior of LLMs against copyright infringing prompts, assessing their willingness to engage in copyright violation. Our findings underscore the critical role of model alignment in identifying potentially infringing inputs, irrespective of model complexity or modality. Notably, prompts that are crafted to avoid overtly malicious language, especially those that instruct the model to complete the input given, tend to yield more responses that could facilitate malicious activities. This research provides a preliminary understanding of copyright infringement by LLMs in software engineering and suggests avenues for future research.

Files

License info not available