VM

V. Mitseva

info

Please Note

1 records found

Unit test assertions are essential for detecting software faults, yet writing them remains costly and time-consuming. Large Language Models (LLMs) offer a promising way to automate assertion generation. However, prior work has primarily focused on generating assertions that closely mimic human-written ones. Because this represents only one possible generation strategy, the impact of alternative approaches on overall quality remains poorly understood. This paper presents an empirical study evaluating four distinct generation strategies: Assertion Generation, which was proposed and evaluated in prior work, alongside Assertion Augmentation, Blind Augmentation, and Chain-of-Thought Generation. Using GPT-oss 20b as the underlying model, we evaluate these strategies on 811 test oracles from 10 open-source projects in the GitBug-Java benchmark. We assess the generated assertions in terms of correctness, fault-detection capability, and textual similarity to developer-written assertions. Our results show that the choice of generation strategy strongly influences performance. Assertion Augmentation performs best overall, achieving the highest compilation rate, execution validity, and mutation score. Meanwhile, Chain-of-Thought Generation detects the highest proportion of real bugs, and standalone Assertion Generation yields results most similar to developer-written tests. Overall, the findings demonstrate that providing LLMs with existing developer-written assertions substantially improves the quality and effectiveness of generated test oracles. ...