Retrieval First: LLM-Assisted Type Inference for Automatic Test Case Generation in JavaScript

None, None

Retrieval First: LLM-Assisted Type Inference for Automatic Test Case Generation in JavaScript

Master Thesis (2026)

Author(s)

L. Negru (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M.J.G. Olsthoorn – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A. Panichella – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

C. Lofi – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Type Inference JavaScript Empirical Software Engineering Automatic Test Case Generation Large Language Model Retrieval Augmented Generation

To reference this document use

https://resolver.tudelft.nl/uuid:1c41bab2-5218-4b6c-aacc-370cae850fa3

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

23-04-2026

Awarding Institution

Delft University of Technology

Project

CS5000 Master Thesis

Programme

Computer Science

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

73

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Automatic test case generation for dynamically typed languages such as JavaScript is significantly hindered by the absence of explicit type information, which expands the search space for search-based testing and reduces its effectiveness. While prior probabilistic and neural type inference methods address this, they struggle with complex user-defined types, higher-order functions, and external package dependencies. This paper presents and evaluates three LLM-based approaches for type inference in JavaScript. The primary contribution is a Retrieval-Augmented Generation (RAG) approach that constructs a vector database of semantically rich code embeddings. These embeddings include ASTs, program slices, and code annotations. This enables efficient, project-wide context retrieval paired with Chain-of-Thought prompting. In a large-scale empirical evaluation against the SynTest framework, the RAG approach achieves a 29% average accuracy improvement over non-RAG LLM approaches, an 85% reduction in computation time, and a 63% accuracy improvement over probabilistic inference for deep, user-defined types. For primitive types, probabilistic methods remain competitive. These findings motivate future hybrid strategies combining probabilistic and LLM-based inference.

https://doi.org/10.5281/zenodo.19496755 Repository link
Replication package of "Retrieval First: LLM- Assisted Type Inference for Automatic Test Case Generation in JavaScript"

Files

Master_Thesis_Lucian_Negru.pdf

(pdf | 1.4 Mb)

License info not available