Automated test generation is a critical area of research in software engineering, aiming to reduce manual effort while improving software reliability. While substantial work has focused on statically typed languages, dynamically typed languages such as JavaScript remain underexpl
...
Automated test generation is a critical area of research in software engineering, aiming to reduce manual effort while improving software reliability. While substantial work has focused on statically typed languages, dynamically typed languages such as JavaScript remain underexplored despite their widespread use and unique challenges. This thesis investigates the current status of JavaScript test generation by systematically evaluating state-of-the-art search-based and large language model-based tools.
We first analyze existing benchmarks to assess their coverage of representative language features, identifying gaps that limit the ability to fairly compare tool performance. We then construct a curated dataset of real-world JavaScript projects and evaluate the LLM-based tool TestPilot and the search-based tool SynTest using a combination of quantitative metrics (e.g., code coverage, pass rates) and feature based correlation analysis. Our results reveal that TestPilot tends to generate higher coverage (median 27.9\% vs 11.2\% branch coverage) and more readable tests but produces a larger number of failing or low-value test cases, while SynTest generates more stable and focused test suites yet can struggle with complex or dynamic code constructs. Our similarity analysis shows that each approach achieved unique coverage, suggesting complementary strengths.
This study highlights the need for standardized, language-aware benchmarks and introduces a curated dataset and evaluation framework for evaluating JavaScript test generation tools. By systematically comparing search-based and LLM-based approaches, this thesis offers insights into their respective strengths, limitations, and opportunities for hybrid strategies, advancing the state of automated testing for dynamically typed languages.