Mutation testing is a way to test the effectiveness of a test suite for catching bugs in a given piece of code. Writing these tests manually can be cumbersome and time-consuming. Automated tools can be used to generate tests that achieve a high mutation score. The output of these
...
Mutation testing is a way to test the effectiveness of a test suite for catching bugs in a given piece of code. Writing these tests manually can be cumbersome and time-consuming. Automated tools can be used to generate tests that achieve a high mutation score. The output of these tools is often very hard to understand for humans, and therefore rarely used as actual test suites for software programs. Because LLMs have been shown to be able to generate programs that can be more easily understood by humans, we ask if these LLMs can be used for improving or generating tests for the purpose of mutation testing. Some LLMs run in the cloud, while others run locally. Cloud-based LLMs such as ChatGPT or Copilot are not always an option because of privacy concerns, speed, or regulations, but do not require possession of hardware. Local LLMs do not have the privacy concerns, but sometimes require large amounts of hardware to be available. This paper will focus on local LLMs that can be run in a computationally restricted environment. We present an automated approach to use a local LLM to improve the mutation score of existing test suites. We compare three different models (DeepSeek Coder, Code Llama and Codestral), evaluated on publicly available datasets. Using this approach, we were able to successfully generate unit tests that, combined with the existing manually written tests, are able to increase the mutation score around one third to half of the time depending on the model.