An Empirical Study of Assertion Generation Strategies for LLM-Based Test Oracles

None, None

An Empirical Study of Assertion Generation Strategies for LLM-Based Test Oracles

Bachelor Thesis (2026)

Author(s)

V. Mitseva (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Panichella – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M.J.G. Olsthoorn – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A. Voulimeneas – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Software Testing Large Language Models (LLMs) Mutation Testing Test-Assertion Generation Assertion Generation

To reference this document use

https://resolver.tudelft.nl/uuid:f3877914-bd2e-4007-945b-c0db271d27c8

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

23-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

3

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Unit test assertions are essential for detecting software faults, yet writing them remains costly and time-consuming. Large Language Models (LLMs) offer a promising way to automate assertion generation. However, prior work has primarily focused on generating assertions that closely mimic human-written ones. Because this represents only one possible generation strategy, the impact of alternative approaches on overall quality remains poorly understood. This paper presents an empirical study evaluating four distinct generation strategies: Assertion Generation, which was proposed and evaluated in prior work, alongside Assertion Augmentation, Blind Augmentation, and Chain-of-Thought Generation. Using GPT-oss 20b as the underlying model, we evaluate these strategies on 811 test oracles from 10 open-source projects in the GitBug-Java benchmark. We assess the generated assertions in terms of correctness, fault-detection capability, and textual similarity to developer-written assertions. Our results show that the choice of generation strategy strongly influences performance. Assertion Augmentation performs best overall, achieving the highest compilation rate, execution validity, and mutation score. Meanwhile, Chain-of-Thought Generation detects the highest proportion of real bugs, and standalone Assertion Generation yields results most similar to developer-written tests. Overall, the findings demonstrate that providing LLMs with existing developer-written assertions substantially improves the quality and effectiveness of generated test oracles.

Files

Research_Paper.pdf

(pdf | 0.379 Mb)

License info not available