A. Panichella | TU Delft Repository

Metamorphic testing for LLM-based Code Repair

Master thesis (2025) - M. de Koning (author) , Annibale Panichella (mentor) , Pouria Derakhshanfar (mentor) , Mitchell Olsthoorn (graduation committee member) , S.E. Verwer (graduation committee member)

Effective LLM-based automated program repair (APR) methods can lead to massive cost reductions and have improved significantly in recent times. However, the validity of many APR evaluations as they are conducted at this point is at risk due to data leakage: Prior research has sho ...

Groot: Impact of Evolutionary Operators in XRPL Testing using Priority-Based Event Representation

Bachelor thesis (2025) - B.J.A. Wassenaar (author) , Burcu Ozkan (mentor) , Mitchell Olsthoorn (mentor) , Annibale Panichella (mentor) , J.E.A.P. Decouchant (graduation committee member)

The decentralized nature of blockchain systems makes them prone to concurrency bugs, which are difficult to detect. There exist testing techniques to find these bugs, such as systematic exploration of the solution space, but these techniques are difficult to scale. Evolutionary a ...

May the Delays Be Ever in Your Favor: Genetic Operators in Delay-Based Testing of the XRPL Consensus Algorithm

Bachelor thesis (2025) - W.R. Kanhai (author) , Burcu Ozkan (mentor) , Mitchell Olsthoorn (mentor) , Annibale Panichella (mentor) , J.E.A.P. Decouchant (graduation committee member)

The XRP Ledger (XRPL) relies on a Byzantine fault-tolerant consensus algorithm to ensure global agreement on transactions across distributed nodes. Despite its critical financial role, the implementation remains under-tested. While prior work has shown the potential of evolutiona ...

Distilling Knowledge for Assertion Generation: Alpha-Temperature Tuning in Smaller Language Models

Bachelor thesis (2025) - K. Hristov (author) , Annibale Panichella (mentor) , Mitchell Olsthoorn (mentor) , P. Kellnhofer (graduation committee member)

Testing of software is crucial to the quality of the final product manual test assertion creation has become a significant bottleneck in the development process, which delays release. Having shown promise in generating assertions automatically, Large language models (LLMs) have s ...

Creating Local LLMs for Test Assertion Generation: A Comparative Study of Knowledge Distillation from CodeT5

Bachelor thesis (2025) - G. Dimitrov (author) , Annibale Panichella (mentor) , Mitchell Olsthoorn (mentor) , P. Kellnhofer (graduation committee member)

Effective test assertions are important for software quality, but their creation is time-consuming. While Large Language Models (LLMs) show promise in automated assertion generation, their size, cost, resource demands, and need for online connection often render them impractical ...

EvoPriority

Evaluating Fitness Functions in Priority-Based Evolutionary Testing for the XRP Ledger Consensus Protocol

Bachelor thesis (2025) - C. Ciocănea (author) , Burcu Ozkan (mentor) , Annibale Panichella (mentor) , Mitchell Olsthoorn (mentor) , J.E.A.P. Decouchant (graduation committee member)

The XRP Ledger Consensus Protocol is a Byzantine fault-tolerant algorithm that enables the XRP Ledger to reach agreement on which transactions to apply, supporting millions of transactions daily. While the protocol is correct by design, its practical implementation is vulnerable ...

Prompt, Seed, Generate: Seeding For Test Case Generator with LLMs

Master thesis (2025) - S. Datskiv (author) , Annibale Panichella (mentor) , Georgios Smaragdakis (graduation committee member) , Alexios Voulimeneas (graduation committee member) , Pouria Derakhshanfar (mentor) , Stephan Lukasczyk (mentor)

Unit test case generation aims to help software developers test programmes. The evolutionary algorithm is one of the successful approaches for unit test case generation that evolves problem solutions over time. Previous research on seeding, the use of previously available informa ...

Surrogate Reloaded: Fast Testing for Deep Reinforcement Learning with Bayesian Neural Networks

Bachelor thesis (2025) - R. Montero González (author) , Annibale Panichella (mentor) , A.J. Bartlett (mentor)

Deep Reinforcement Learning (DRL) is a powerful framework for training autonomous agents in complex environments. However, testing these agents is still prohibitively expensive due to the need for extensive simulations and the rarity of failure events, such as collisions or timeo ...

XGBoost as a Surrogate Model for Testing Deep Reinforcement Learning Agents

Bachelor thesis (2025) - A.P. Ramai (author) , Annibale Panichella (mentor) , P. Kellnhofer (graduation committee member) , Antony Bartlett (mentor)

Testing Deep Reinforcement Learning (DRL) agents is computationally expensive and inefficient, especially when trying to identify environment configurations where the agent fails to reach its objective. Recent work proposes the use of a Multi-Layer-Perceptron (MLP) as a surrogate ...

Surrogate Reloaded: Fast Testing for Deep Reinforcement Learning

Convolutional Neural Networks as surrogate model for DRL testing

Bachelor thesis (2025) - L.M. Braszczyński (author) , Annibale Panichella (mentor) , A.J. Bartlett (mentor)

In recent years, Deep Reinforcement Learning (DRL) has moved away from playing games to more practical tasks like autonomous parking. This transition has created a need for efficient testing of DRL agents. To evaluate an agent, we need to run a simulation of the task and let the ...

Surrogate Reloaded: LSTM-Based Failure Prediction for Testing DRL Agents

Bachelor thesis (2025) - S.J. van den Wildenberg (author) , Annibale Panichella (mentor) , Antony Bartlett (mentor) , P. Kellnhofer (graduation committee member)

Testing Deep Reinforcement Learning agents for safety and performance failures is critical but computationally expensive, requiring efficient methods to discover failure-inducing scenarios. Indago, a state-of-the-art testing framework, addresses this by using a Multi-Layer Percep ...

Survival of the Fittest

Evaluating Fitness Functions for Concurrency Testing on the XRPL Consensus Protocol

Bachelor thesis (2025) - A. Mousavi Gourabi (author) , Burcu Ozkan (mentor) , Mitchell Olsthoorn (mentor) , Annibale Panichella (mentor) , J.E.A.P. Decouchant (graduation committee member)

Distributed systems, such as blockchains, can have bugs around edge-cases that are hard to detect or trigger. Previous publications have introduced guided-search testing approaches that are able to find edge cases more efficiently than through conducting a systematic and exhausti ...

Distilling CodeT5 for Efficient On-Device Test-Assertion Generation

Combining response-based distillation and architectural tuning to deliver near-teacher quality on resource-constrained devices

Bachelor thesis (2025) - A.V. Nicula (author) , Annibale Panichella (mentor) , Mitchell Olsthoorn (mentor) , P. Kellnhofer (graduation committee member)

Writing clear, semantically rich test assertions remains a major bottleneck in software development. While large pre-trained models such as CodeT5 excel at synthesizing assertions, their size and latency make them impractical for on-premise or resourceconstrained workflows. In th ...

Closing the Gap: Java Test Assertion Generation via Knowledge Distillation with Trident Loss

Bachelor thesis (2025) - M.D. Chu (author) , Annibale Panichella (mentor) , M.J.G. Olsthoorn (mentor) , Petr Kellnhofer (graduation committee member)

Software testing is crucial in the software development process to ensure quality. However, automating test assertion generation remains a significant challenge in software engineering due to the need for both precise syntactic structure and semantic correctness. While large lang ...

Efficient Local Test Assertion Generation

Distilling CodeT5+ for Reduced Model Size and High Accuracy

Bachelor thesis (2025) - D. Wu (author) , Mitchell Olsthoorn (mentor) , Annibale Panichella (mentor) , P. Kellnhofer (mentor)

Effective software testing relies on the quality and correctness of test assertions. Recent Large Language Models (LLMs), such as CodeT5+, have shown significant promise in automating assertion generation tasks; however, their substantial computational resource demands limit thei ...

LLM-Seeded Evolutionary Testing (LSET): Enhancing Search-Based Software Testing by Incorporating Complex Method Sequences

Master thesis (2025) - C.R.E. Li (author) , Mitchell Olsthoorn (mentor) , A. Deljouyi (mentor) , Annibale Panichella (mentor) , Sicco Verwer (graduation committee member)

Writing test cases is an important yet complex task. Search-Based Software Testing (SBST) is an automated test case generation technique that aims to help developers by creating high-coverage test cases. Despite its strengths, a major limitation of this technique is that it often ...

Hyperparameter-Tuned Randomized Testing for Byzantine Fault-Tolerance of the XRP Ledger Consensus Protocol

Bachelor thesis (2025) - A. Macijauskaitė (author) , Burcu Ozkan (mentor) , Mitchell Olsthoorn (mentor) , Annibale Panichella (mentor) , J.E.A.P. Decouchant (graduation committee member)

Blockchain systems rely on consensus protocols to ensure agreement among nodes even in the presence of malicious or faulty nodes. A consensus protocol that provides safety and liveness guarantees under such conditions is known as a Byzantine fault‑tolerant (BFT) protocol. Various ...

Privacy Preserving Train Scheduling

Using homomorphic encryption to create train schedules

Master thesis (2024) - P. Jain (author) , Zekeriya Erkin (mentor) , Tianyu Li (graduation committee member) , Annibale Panichella (coach)

A substantial number of passengers in Europe rely on trains for transportation, facilitated by a network of high-speed international trains. However, the coordination of train schedules across multiple networks often poses challenges due to incompatible timings. The scheduling of ...

Evaluating the Effectiveness of Meta Llama3 70B for Unit Test Generation

Bachelor thesis (2024) - R.J.H. Schep (author) , Annibale Panichella (mentor) , Mitchell Olsthoorn (mentor) , Casper Poulsen (graduation committee member)

The automated generation of test suites is crucial for enhancing software quality and efficiency. Manually writing tests is time-consuming and accounts for about 15% of project time while tests generated by automated tools like EvoSuite and Pynguin often lack readability and comp ...

Enhancing Unit Tests using ChatGPT-3.5

Bachelor thesis (2024) - S. Creastă (author) , Annibale Panichella (mentor) , Mitchell Olsthoorn (mentor) , Casper Poulsen (graduation committee member)

Manually crafting test suites is time-consuming and susceptible to bugs. The automation of this process has the potential to make this task more appealing. While current tools like EvoSuite manage to obtain high coverages, their generated tests are not always readable. Recent lit ...