An Empirical Study of Assertion Generation Strategies for LLM-Based Test Oracles
V. Mitseva (TU Delft - Electrical Engineering, Mathematics and Computer Science)
A. Panichella – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
M.J.G. Olsthoorn – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
A. Voulimeneas – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Unit test assertions are essential for detecting software faults, yet writing them remains costly and time-consuming. Large Language Models (LLMs) offer a promising way to automate assertion generation. However, prior work has primarily focused on generating assertions that closely mimic human-written ones. Because this represents only one possible generation strategy, the impact of alternative approaches on overall quality remains poorly understood. This paper presents an empirical study evaluating four distinct generation strategies: Assertion Generation, which was proposed and evaluated in prior work, alongside Assertion Augmentation, Blind Augmentation, and Chain-of-Thought Generation. Using GPT-oss 20b as the underlying model, we evaluate these strategies on 811 test oracles from 10 open-source projects in the GitBug-Java benchmark. We assess the generated assertions in terms of correctness, fault-detection capability, and textual similarity to developer-written assertions. Our results show that the choice of generation strategy strongly influences performance. Assertion Augmentation performs best overall, achieving the highest compilation rate, execution validity, and mutation score. Meanwhile, Chain-of-Thought Generation detects the highest proportion of real bugs, and standalone Assertion Generation yields results most similar to developer-written tests. Overall, the findings demonstrate that providing LLMs with existing developer-written assertions substantially improves the quality and effectiveness of generated test oracles.