LLM-Based Unit Test Generation Without Source Code

None, None

LLM-Based Unit Test Generation Without Source Code

An Empirical Evaluation of Bytecode Representations, Prompt Engineering, Model Selection, and Temperature Settings

Bachelor Thesis (2026)

Author(s)

A.Z. Głodek (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

C.R. Paulsen – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

S. Proksch – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Large language models Unit test generation Bytecode analysis

To reference this document use

https://resolver.tudelft.nl/uuid:f7a36a46-3099-495d-8d07-66d58a79e735

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

22-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

7

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Automated unit test generation is often used to reduce the manual effort required to create and maintain software tests. Recently, Large Language Models have shown promising results in generating unit tests directly from source code; however, existing work assumes source code availability, which is not always the case. Many third-party libraries are distributed only as compiled artifacts, making source-code-based test generation difficult or impossible. Generating tests from compiled software could help developers evaluate the behavioural compatibility of dependency updates without access to the original codebase, but it remains unclear how well LLMs perform when only bytecode is accessible.

This research investigates LLM-based unit test generation using only bytecode. I developed an automated pipeline that generates, compiles, executes, and evaluates JUnit tests for Java libraries from disassembled and decompiled bytecode. I used this pipeline to study how model choice, representation, prompting, and temperature affect compilation, execution, and coverage. I also evaluated the best configuration using iterative prompting and compared it against EvoSuite.

Across 50 Java libraries, the best configuration achieved 89.5% compilation success and 83.6% execution success. Few-shot prompting and higher temperatures produced the strongest single-pass results, while iterative prompting nearly doubled coverage. Compared with EvoSuite, the approach produced usable tests for all 19 evaluated libraries, while EvoSuite succeeded on 9 but achieved higher coverage where it succeeded. These results suggest that bytecode-based LLM test generation is promising when source code is unavailable.

Files

AnnaGlodek_ThesisFinal.pdf

(pdf | 0.235 Mb)

License info not available