PyTestGuard: An IDE-Integrated Tool for Supporting Developers with LLM-Generated Unit Tests

None, None

PyTestGuard: An IDE-Integrated Tool for Supporting Developers with LLM-Generated Unit Tests

Master Thesis (2025)

Author(s)

N. Mouman (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Carolin Brandt – Mentor (TU Delft - Software Engineering)

A. Panichella – Mentor (TU Delft - Software Engineering)

W.P. Brinkman – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty

Electrical Engineering, Mathematics and Computer Science

Software Testing User Study IDE Test Generation Unit Tests Large Language Models (LLM) Test Quality

To reference this document use:

https://resolver.tudelft.nl/uuid:a74cd9ea-76ef-4008-8cc5-20ed6ecd77ff

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

12-09-2025

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Unit testing is an important step in the software development workflow to detect bugs and ensure system correctness. Recently, Large Language Models (LLMs) have been explored to automate unit test generation and have demonstrated promising results. However, the generated tests are not always reliable, as they may contain syntax errors, hallucinations, test smells, or failing assertions. We conjecture that providing developers with feedback on such issues will increase the adoption of LLMs in realworld workflows. To address this, we propose PyTestGuard, a PyCharm plugin that allows developers to generate and refine unit tests directly within the Integrated Development Environment (IDE). Beyond test generation, PyTestGuard helps users evaluate test quality by detecting test smells and reporting issues such as missing arguments or references to non-existing objects. We conducted a user study with nine participants to assess PyTestGuard’s usefulness as a testing assistant and to identify areas for improvement. Participants reported that the tool’s feedback on test quality, along with its summarised error messages and coverage information, supported them while writing unit tests. However, they also faced challenges and suggested improvements before completely trusting LLM-based test generation in their development workflow. Based on these findings, we highlight several design recommendations for future tools that aim to integrate LLMs into software testing workflows.

Files

Final_thesis_nada_mouman.pdf

(pdf | 1.87 Mb)

License info not available