Software testing is a vital yet time consuming process during the development lifecycle, often causing engineers to limit its use in practice. In order to encourage active software testing, researchers have shown significant advances in automatic unit test case gener- ation with
...
Software testing is a vital yet time consuming process during the development lifecycle, often causing engineers to limit its use in practice. In order to encourage active software testing, researchers have shown significant advances in automatic unit test case gener- ation with approaches such as search-based testing (i.e., EvoSuite) and large language models (i.e., ChatGPT). However, while the first suffers with exploring edge cases of the input space, the latter still suffers from hallucinations during code synthesis, limiting the use of both solutions. This research aims to overcome these limitations by utilizing the strengths of both techniques, which are effective test structure generation and program inference, respectively. In particular, the assertions of initial unit tests generated by EvoSuite are augmented using ChatGPT-4o, with the aim of improving the mutation score, and hence the overall test suite effectiveness. We evaluate our solution, called evoLLve’M, on a benchmark of 20 Java classes from the SourceForge110 Corpus and compare it to only using EvoSuite, which is considered the state-of-the-art ap- proach. Results show that evoLLve’M outperforms EvoSuite in 25% of the classes for mutation score, without negatively impacting other classes. It boosts the total number of killed mutations by 3%, achieving the most improvement for mutations types of increments and null returns, being 26.9% and 8.9%, respectively.