How does LLM-based test generation for Java libraries perform when full source code is available?

None, None

How does LLM-based test generation for Java libraries perform when full source code is available?

Evaluating LLM-based test generation for libraries across code representations

Bachelor Thesis (2026)

Author(s)

Cao Minh Nguyen Cao Minh (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Cathrine Paulsen – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Sebastian Proksch – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Large Language Models (LLMs) LLM-based test generation Dependency validation

To reference this document use

https://resolver.tudelft.nl/uuid:0a3ad840-f579-48cc-84db-3bc4c5a5da94

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

26-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

4

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

As software projects depend heavily on open-source libraries, developers use tests to ensure that dependency updates remain behaviourally compatible. However, such library tests are often incomplete or unavailable. Although automated test generation tools such as EvoSuite exist and Large Language Models (LLMs) have shown promise in generating more readable tests, most evaluations have been conducted on benchmark datasets or popular GitHub projects. This creates a gap in understanding how effective LLM-generated tests are for released library artifacts. In this paper, we evaluate LLM-based test generation for released Java libraries from Maven Central to assess its feasibility in dependency validation workflows. We implement a pipeline that provides source code and method context to a locally hosted LLM, validates generated tests, and applies iterative repair when needed. Our results show that tests generated by the local model achieve substantially lower coverage than EvoSuite, primarily due to compilation failures, highlighting that symbol resolution errors remain a key challenge in generating tests with LLMs. We further show that iterative repair is effective at improving the coverage of generated tests and a stronger cloud-hosted model even surpasses EvoSuite in coverage. Overall, the findings indicate that LLM-based test generation from source code is a promising approach for dependency update validation when combined with sufficiently capable models and iterative repair mechanisms.

Files

Cao_Minh_Nguyen_-_Final_Resear... (pdf)

(pdf | 0.239 Mb)

License info not available