evoLLve'M: Improving JUnit Test Assertions and Mutation Score Using ChatGPT-4o and EvoSuite

None, None

evoLLve'M: Improving JUnit Test Assertions and Mutation Score Using ChatGPT-4o and EvoSuite

Bachelor Thesis (2024)

Author(s)

D.A. Turhan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M.J.G. Olsthoorn – Mentor (TU Delft - Software Engineering)

A. Panichella – Graduation committee member (TU Delft - Software Engineering)

Faculty

Electrical Engineering, Mathematics and Computer Science

Search-Based Software Testing Software testing Large Language Models (LLMs)

To reference this document use:

https://resolver.tudelft.nl/uuid:a672e55b-5fa3-4512-a706-4d96a259b2a7

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

01-07-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Software testing is a vital yet time consuming process during the development lifecycle, often causing engineers to limit its use in practice. In order to encourage active software testing, researchers have shown significant advances in automatic unit test case gener- ation with approaches such as search-based testing (i.e., EvoSuite) and large language models (i.e., ChatGPT). However, while the first suffers with exploring edge cases of the input space, the latter still suffers from hallucinations during code synthesis, limiting the use of both solutions. This research aims to overcome these limitations by utilizing the strengths of both techniques, which are effective test structure generation and program inference, respectively. In particular, the assertions of initial unit tests generated by EvoSuite are augmented using ChatGPT-4o, with the aim of improving the mutation score, and hence the overall test suite effectiveness. We evaluate our solution, called evoLLve’M, on a benchmark of 20 Java classes from the SourceForge110 Corpus and compare it to only using EvoSuite, which is considered the state-of-the-art ap- proach. Results show that evoLLve’M outperforms EvoSuite in 25% of the classes for mutation score, without negatively impacting other classes. It boosts the total number of killed mutations by 3%, achieving the most improvement for mutations types of increments and null returns, being 26.9% and 8.9%, respectively.

Files

Cover_page_2.pdf

(pdf | 0.89 Mb)

License info not available