Retrieval First: LLM-Assisted Type Inference for Automatic Test Case Generation in JavaScript

Master Thesis (2026)
Author(s)

L. Negru (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M.J.G. Olsthoorn – Mentor (TU Delft - Software Engineering)

A. Panichella – Mentor (TU Delft - Software Engineering)

C. Lofi – Graduation committee member (TU Delft - Web Information Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
23-04-2026
Awarding Institution
Delft University of Technology
Project
CS5000 Master Thesis
Programme
Computer Science
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
40
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Automatic test case generation for dynamically typed languages such as JavaScript is significantly hindered by the absence of explicit type information, which expands the search space for search-based testing and reduces its effectiveness. While prior probabilistic and neural type inference methods address this, they struggle with complex user-defined types, higher-order functions, and external package dependencies. This paper presents and evaluates three LLM-based approaches for type inference in JavaScript. The primary contribution is a Retrieval-Augmented Generation (RAG) approach that constructs a vector database of semantically rich code embeddings. These embeddings include ASTs, program slices, and code annotations. This enables efficient, project-wide context retrieval paired with Chain-of-Thought prompting. In a large-scale empirical evaluation against the SynTest framework, the RAG approach achieves a 29% average accuracy improvement over non-RAG LLM approaches, an 85% reduction in computation time, and a 63% accuracy improvement over probabilistic inference for deep, user-defined types. For primitive types, probabilistic methods remain competitive. These findings motivate future hybrid strategies combining probabilistic and LLM-based inference.

https://doi.org/10.5281/zenodo.19496755 Repository link
Replication package of "Retrieval First: LLM- Assisted Type Inference for Automatic Test Case Generation in JavaScript"

Files

License info not available