Reducing LLM Hallucinations with Retrieval Prompt Engineering

Minimising the Need for Re-prompting in Automatic Understandable Test Generation

Bachelor Thesis (2024)
Author(s)

A. Mentzelopoulou (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Deljouyi – Mentor (TU Delft - Software Engineering)

Andy Zaidman – Mentor (TU Delft - Software Technology)

A. Katsifodimos – Graduation committee member (TU Delft - Data-Intensive Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
25-06-2024
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Automated test generation is the means to produce correct and usable code while maintaining an efficient and effective development process. UTGen is a tool that utilizes a Large Language Model (LLM) to improve the understandability of a test suite generated by a Search-Based Software Testing tool, namely EvoSuite. Often while the LLM attempts to improve a given test case, it generates code that is too far from the original, changing the test's purpose. Alternatively, it may generate code that does not compile. Such behaviour is called ``LLM Hallucination".

The current hallucination handling of UTGen is time-consuming and resource-expensive. To address this, we propose two alternative approaches that use information retrieval prompt engineering techniques to minimise hallucinations. Our respective techniques include incorporating the source code under test and the errors thrown by the latest generated test case to the LLM prompt. We assess our methods through a comparison study against the base UTGen version. We observe that source code retrieval enhances the generation of compilable test cases for complex classes. Error code retrieval shows similar hallucination performance to base UTGen, with a decrease in the number of re-prompts for classes with a high normalised Lack of Cohesion of Methods (*LCOM).

Index Terms - Automated Test Generation, Large Language Models (LLMs), LLM Hallucination, Prompt Engineering

Files

FinalPaper.pdf
(pdf | 0.832 Mb)
License info not available