M.J.G. Olsthoorn | TU Delft Repository

Empirical Analysis of SBST tools: A taxonomy of coverage gaps

Master thesis (2025) - N. Rusnac (author) , Mitchell Olsthoorn (mentor) , Arie van Deursen (graduation committee member) , Jeremie Decouchant (graduation committee member)

Search-Based Software Testing (SBST) tools can automatically generate tests to achieve high code coverage; however, a systematic understanding of why they fail in specific situations is necessary. This thesis addresses this gap by developing a comprehensive taxonomy of coverage f ...

Distilling Knowledge for Assertion Generation: Alpha-Temperature Tuning in Smaller Language Models

Bachelor thesis (2025) - K. Hristov (author) , Annibale Panichella (mentor) , Mitchell Olsthoorn (mentor) , P. Kellnhofer (graduation committee member)

Testing of software is crucial to the quality of the final product manual test assertion creation has become a significant bottleneck in the development process, which delays release. Having shown promise in generating assertions automatically, Large language models (LLMs) have s ...

Creating Local LLMs for Test Assertion Generation: A Comparative Study of Knowledge Distillation from CodeT5

Bachelor thesis (2025) - G. Dimitrov (author) , Annibale Panichella (mentor) , Mitchell Olsthoorn (mentor) , P. Kellnhofer (graduation committee member)

Effective test assertions are important for software quality, but their creation is time-consuming. While Large Language Models (LLMs) show promise in automated assertion generation, their size, cost, resource demands, and need for online connection often render them impractical ...

May the Delays Be Ever in Your Favor: Genetic Operators in Delay-Based Testing of the XRPL Consensus Algorithm

Bachelor thesis (2025) - W.R. Kanhai (author) , Burcu Özkan (mentor) , Mitchell Olsthoorn (mentor) , Annibale Panichella (mentor) , J.E.A.P. Decouchant (graduation committee member)

The XRP Ledger (XRPL) relies on a Byzantine fault-tolerant consensus algorithm to ensure global agreement on transactions across distributed nodes. Despite its critical financial role, the implementation remains under-tested. While prior work has shown the potential of evolutiona ...

EvoPriority

Evaluating Fitness Functions in Priority-Based Evolutionary Testing for the XRP Ledger Consensus Protocol

Bachelor thesis (2025) - C. Ciocănea (author) , Burcu Özkan (mentor) , Annibale Panichella (mentor) , Mitchell Olsthoorn (mentor) , J.E.A.P. Decouchant (graduation committee member)

The XRP Ledger Consensus Protocol is a Byzantine fault-tolerant algorithm that enables the XRP Ledger to reach agreement on which transactions to apply, supporting millions of transactions daily. While the protocol is correct by design, its practical implementation is vulnerable ...

Groot: Impact of Evolutionary Operators in XRPL Testing using Priority-Based Event Representation

Bachelor thesis (2025) - B.J.A. Wassenaar (author) , Burcu Özkan (mentor) , Mitchell Olsthoorn (mentor) , Annibale Panichella (mentor) , J.E.A.P. Decouchant (graduation committee member)

The decentralized nature of blockchain systems makes them prone to concurrency bugs, which are difficult to detect. There exist testing techniques to find these bugs, such as systematic exploration of the solution space, but these techniques are difficult to scale. Evolutionary a ...

Survival of the Fittest

Evaluating Fitness Functions for Concurrency Testing on the XRPL Consensus Protocol

Bachelor thesis (2025) - A. Mousavi Gourabi (author) , Burcu Özkan (mentor) , Mitchell Olsthoorn (mentor) , Annibale Panichella (mentor) , J.E.A.P. Decouchant (graduation committee member)

Distributed systems, such as blockchains, can have bugs around edge-cases that are hard to detect or trigger. Previous publications have introduced guided-search testing approaches that are able to find edge cases more efficiently than through conducting a systematic and exhausti ...

Efficient Local Test Assertion Generation

Distilling CodeT5+ for Reduced Model Size and High Accuracy

Bachelor thesis (2025) - D. Wu (author) , Mitchell Olsthoorn (mentor) , Annibale Panichella (mentor) , P. Kellnhofer (mentor)

Effective software testing relies on the quality and correctness of test assertions. Recent Large Language Models (LLMs), such as CodeT5+, have shown significant promise in automating assertion generation tasks; however, their substantial computational resource demands limit thei ...

Distilling CodeT5 for Efficient On-Device Test-Assertion Generation

Combining response-based distillation and architectural tuning to deliver near-teacher quality on resource-constrained devices

Bachelor thesis (2025) - A.V. Nicula (author) , Annibale Panichella (mentor) , Mitchell Olsthoorn (mentor) , P. Kellnhofer (graduation committee member)

Writing clear, semantically rich test assertions remains a major bottleneck in software development. While large pre-trained models such as CodeT5 excel at synthesizing assertions, their size and latency make them impractical for on-premise or resourceconstrained workflows. In th ...

LLM-Seeded Evolutionary Testing (LSET): Enhancing Search-Based Software Testing by Incorporating Complex Method Sequences

Master thesis (2025) - C.R.E. Li (author) , Mitchell Olsthoorn (mentor) , A. Deljouyi (mentor) , Annibale Panichella (mentor) , Sicco Verwer (graduation committee member)

Writing test cases is an important yet complex task. Search-Based Software Testing (SBST) is an automated test case generation technique that aims to help developers by creating high-coverage test cases. Despite its strengths, a major limitation of this technique is that it often ...

Hyperparameter-Tuned Randomized Testing for Byzantine Fault-Tolerance of the XRP Ledger Consensus Protocol

Bachelor thesis (2025) - A. Macijauskaitė (author) , Burcu Özkan (mentor) , Mitchell Olsthoorn (mentor) , Annibale Panichella (mentor) , J.E.A.P. Decouchant (graduation committee member)

Blockchain systems rely on consensus protocols to ensure agreement among nodes even in the presence of malicious or faulty nodes. A consensus protocol that provides safety and liveness guarantees under such conditions is known as a Byzantine fault‑tolerant (BFT) protocol. Various ...

evoLLve'M: Improving JUnit Test Assertions and Mutation Score Using ChatGPT-4o and EvoSuite

Bachelor thesis (2024) - D.A. Turhan (author) , Mitchell Olsthoorn (mentor) , Annibale Panichella (graduation committee member)

Software testing is a vital yet time consuming process during the development lifecycle, often causing engineers to limit its use in practice. In order to encourage active software testing, researchers have shown significant advances in automatic unit test case gener- ation with ...

Using local LLMs in constrained environments for increasing mutation score

Bachelor thesis (2024) - R.R.L. van der Geest (author) , Annibale Panichella (mentor) , Mitchell Olsthoorn (mentor) , Casper Poulsen (graduation committee member)

Mutation testing is a way to test the effectiveness of a test suite for catching bugs in a given piece of code. Writing these tests manually can be cumbersome and time-consuming. Automated tools can be used to generate tests that achieve a high mutation score. The output of these ...

Evaluating the Effectiveness of Meta Llama3 70B for Unit Test Generation

Bachelor thesis (2024) - R.J.H. Schep (author) , Annibale Panichella (mentor) , Mitchell Olsthoorn (mentor) , Casper Poulsen (graduation committee member)

The automated generation of test suites is crucial for enhancing software quality and efficiency. Manually writing tests is time-consuming and accounts for about 15% of project time while tests generated by automated tools like EvoSuite and Pynguin often lack readability and comp ...

Enhancing Unit Tests using ChatGPT-3.5

Bachelor thesis (2024) - S. Creastă (author) , Annibale Panichella (mentor) , Mitchell Olsthoorn (mentor) , Casper Poulsen (graduation committee member)

Manually crafting test suites is time-consuming and susceptible to bugs. The automation of this process has the potential to make this task more appealing. While current tools like EvoSuite manage to obtain high coverages, their generated tests are not always readable. Recent lit ...

The Effectiveness of GPT-4o for Generating Test Assertions

Bachelor thesis (2024) - A. Bagdonas (author) , Mitchell Olsthoorn (mentor) , Annibale Panichella (mentor) , Casper Poulsen (graduation committee member)

Over the last few years, Large Language Models have become remarkably popular in research and in daily use with GPT-4o being the most advanced model from OpenAI as of the publishing of this paper. We assessed its performance in unit test generation using mutation testing. 20 Java ...

The Impact of Test Case Clustering on Comprehending Automatically Generated Test Suites

Master thesis (2023) - L. Lin (author) , A. Panichella (mentor) , Mitchell Olsthoorn (mentor) , Willem Paul Brinkman (graduation committee member)

Software testing, a critical phase in the software development lifecycle, is often hindered by the time-intensive and costly manual creation of test cases. While automating test case generation could mitigate these challenges, its adoption in the industry has been limited due to ...

Test Program-Based Generative Fuzzing for Differential Testing of the Kotlin Compiler

Master thesis (2023) - C.A. Georgescu (author) , A. Panichella (mentor) , Mitchell Olsthoorn (mentor) , Sicco Verwer (graduation committee member)

Kotlin is a programming language best known for its interoperability with Java, as well as the measurable improvements it offers over it. Since it became Android’s go-to language in 2019, the popularity and impact of Kotlin have risen greatly. Amidst this surge in popularity, the ...

Kotlin is a programming language best known for its interoperability with Java, as well as the measurable improvements it offers over it. Since it became Android’s go-to language in 2019, the popularity and impact of Kotlin have risen greatly. Amidst this surge in popularity, the Kotlin developer team is working on a new version of the compiler that introduces sweeping changes to the ecosystem. Traditional compiler testing is a manual and laborious task that requires extensive developer effort and expertise. In an attempt to mitigate this, researchers have invested great resources in developing and perfecting automated compiler testing tools over the last decades. These approaches generate new pieces of code to test the behavior of compilers, which is assessed through differential testing. However, the usage of heuristics as guidance for the generative process is not well understood, and no approach that generates Kotlin code from scratch currently exists. In this thesis, we propose a novel method of enriching standard grammar specifications with language-targeted semantic context that is integrated in the sampling process. We structure generated code hierarchically and use it as the base of an evolutionary computation framework. Within this framework, we introduce two classes of algorithms that are novel to the field of compiler fuzzing, based on syntactic diversity and semantic proximity, respectively. We carry out an empirical analysis spanning 200K generated Kotlin files, which we analyzed through different Kotlin compiler versions. Our results uncovered five previously unreported categories of bugs, which we reported to the Kotlin compiler developer team. The developers verified and replicated our instances on the current re- lease of the Kotlin compiler, and have assigned target release dates for fixes within the current major version of the compiler. The study also provides new insight into the effects of heuristic-specific hyperparameters such as expression simplicity, dissimilarity measurements, and target selection.

Performance of the Pareto Envelope-Based Search Algorithm - II in Automated Test Case Generation

Bachelor thesis (2023) - A. Abhishek (author) , Annibale Panichella (mentor) , Mitchell Olsthoorn (mentor) , Dimitri Stallenberg (mentor) , S.E. Verwer (graduation committee member)

Software testing is an important yet time consuming task in the software development life cycle. Artificial Intelligence (AI) algorithms have been used to automate this task and have proven to be proficient at it. This research focuses on the automated testing of JavaScript progr ...

Can RVEA with DynaMOSA features perform well at generating test cases?

Bachelor thesis (2023) - S. Datskiv (author) , Annibale Panichella (mentor) , Mitchell Olsthoorn (mentor) , Dimitri Stallenberg (mentor) , S.E. Verwer (graduation committee member)

On the intuitive level, software testing is important because it assures the quality of the software used by humans. However, ensuring this quality is not an easy task because as the complexity of the software increases, so do the efforts to test it. Search-based software testing ...