Breaking the Silence: the Threats of Using LLMs in Software Engineering

None, None; None, None; None, None

Breaking the Silence: the Threats of Using LLMs in Software Engineering

Conference Paper (2024)

Author(s)

J. Sallou (TU Delft - Software Engineering)

T. Durieux (TU Delft - Software Engineering)

A. Panichella (TU Delft - Software Engineering)

Research Group

Software Engineering

DOI related publication

https://doi.org/10.1145/3639476.3639764

Large Language Models Artificial Intelligence Empirical Software Engineering Empirical Software Validation

To reference this document use:

https://resolver.tudelft.nl/uuid:c88e9d83-f256-4a51-88ee-72c347cc260f

More Info

expand_more

Publication Year

2024

Language

English

Research Group

Software Engineering

Pages (from-to)

102-106

Publisher

IEEE / ACM

ISBN (electronic)

9798400705007

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Large Language Models (LLMs) have gained considerable traction within the Software Engineering (SE) community, impacting various SE tasks from code completion to test generation, from program repair to code summarization. Despite their promise, researchers must still be careful as numerous intricate factors can influence the outcomes of experiments involving LLMs.
This paper initiates an open discussion on potential threats to the validity of LLM-based research including issues such as closed-source models, possible data leakage between LLM training data and research evaluation, and the reproducibility of LLM-based findings.
In response, this paper proposes a set of guidelines tailored for SE researchers and Language Model (LM) providers to mitigate these concerns.
The implications of the guidelines are illustrated using existing good practices followed by LLM providers and a practical example for SE researchers in the context of test case generation.

Files

3639476.3639764.pdf

(pdf | 0.718 Mb)