Exploring Test Suite Coverage of Large Language Model–Enhanced Unit Test Generation

A Study on the Ability of Large Language Models to Improve the Understandability of Generated Unit Tests Without Compromising Coverage

Bachelor Thesis (2024)
Author(s)

A. Drăgoi (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Andy Zaidman – Mentor (TU Delft - Software Technology)

A. Deljouyi – Mentor (TU Delft - Software Engineering)

A. Katsifodimos – Graduation committee member (TU Delft - Data-Intensive Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
25-06-2024
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Automated software testing is a frequently studied topic in specialized literature. Search-based software testing tools, like EvoSuite, can generate test suites using genetic algorithms without the developer’s input. Large Language Models (LLMs) have recently attracted significant attention in the software engineering domain for their potential to automate test generation. UTGen, a tool integrating LLMs with EvoSuite, produces more understandable tests than EvoSuite; however, the generated tests suffer a coverage drop.

To streamline bug detection by developers, we propose UTGenCov, a concept that focuses on improving the understandability of EvoSuite-generated tests without compromising on coverage. This approach builds upon UTGen by thoroughly analyzing the reasons behind the decrease in coverage and proposing an alternative approach.

Our investigation determined that the leading cause of coverage reduction in UTGen is LLM hallucination in the Understandability phase. UTGenCov aims to address hallucinations by providing the source code of the methods used in the test to the LLM. Yet, our experiment results indicate inconsistent performance and a further decrease in branch coverage of 0.74% compared to UTGen.

Files

License info not available