Print Email Facebook Twitter The (ab)use of Open Source Code to Train Large Language Models Title The (ab)use of Open Source Code to Train Large Language Models Author Al-Kaswan, A. (TU Delft Software Engineering) Izadi, M. (TU Delft Software Engineering) Date 2023 Abstract In recent years, Large Language Models (LLMs) have gained significant popularity due to their ability to generate human-like text and their potential applications in various fields, such as Software Engineering. LLMs for Code are commonly trained on large unsanitized corpora of source code scraped from the Internet. The content of these datasets is memorized and emitted by the models, often in a verbatim manner. In this work, we will discuss the security, privacy, and licensing implications of memorization. We argue why the use of copyleft code to train LLMs is a legal and ethical dilemma. Finally, we provide four actionable recommendations to address this issue. To reference this document use: http://resolver.tudelft.nl/uuid:44972319-b624-470a-8ea6-fa758ac6cec3 DOI https://doi.org/10.1109/NLBSE59153.2023.00008 Embargo date 2024-02-05 ISBN 979-8-3503-0178-6 Source Proceedings o the 2nd International Workshop on NL-based Software Engineering Event 2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE), 2023-05-14 → 2023-05-20, Melbourne, Australia Series Proceedings - 2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering, NLBSE 2023 Bibliographical note Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. Part of collection Institutional Repository Document type conference paper Rights © 2023 A. Al-Kaswan, M. Izadi Files PDF NLBSE_Position_Paper_2_.pdf 176.87 KB PDF The_abuse_of_Open_Source_ ... Models.pdf 241.98 KB Close viewer /islandora/object/uuid:44972319-b624-470a-8ea6-fa758ac6cec3/datastream/OBJ1/view