L.H. Applis | TU Delft Repository

Suspicious Types and Bad Neighborhoods

Filtering Spectra with Compiler Information

Conference paper (2025) - L.H. Applis (author) , M. P. Gissurarson (author) , A. Panichella (author)

Spectrum-based fault localization and its formulas often struggle with large spectra containing many expressions irrelevant to the fault, which impacts its overall effectiveness. Spectra can inflate for large programs or on finer granularity, such as expression-level coverage fro ...

Tool-Driven Quality Assurance for Functional Programming and Machine Learning

Doctoral thesis (2024) - L.H. Applis (author) , A. van Deursen (promotor) , Annibale Panichella (copromotor)

Finding and fixing software faults is a major part of software development and as such any improvement for such tasks is a welcome aid for developers and a worthwhile field for researchers. Like programming in general, debugging and repair need specialized tools to provide the ne ...

Finding and fixing software faults is a major part of software development and as such any improvement for such tasks is a welcome aid for developers and a worthwhile field for researchers. Like programming in general, debugging and repair need specialized tools to provide the necessary information (like the usage of runtime resources) or assure quality (e.g. with test suites). Only then, developers are able to repair faults without introducing new ones. There are also more sophisticated tools that provide stronger, more automated help to developers: Program coverage summarizes run-time behavior, fault localization helps to narrow down suspicious parts of the code and automated program repair suggests possible patches that lead to a passing test suite. On top of these approaches, large language models show promising capabilities to generate, alter and test source-code, but they have yet to be tested and hardened for their security and quality.
To enable the next generation of state of the art quality-assurance tools, this thesis investigates different techniques and their respective tools to improve their precision and correctness. To this end, we develop procedures to quantify the robustness of large language models of code to identify their weaknesses when facing metamorphic noise or statistically unlikely data. After examining quality of tools, this dissertation works towards improving existing tools and approaches in the field of functional programming, particularly for Haskell. Functional programming is a field rich of unique options such as properties, strong type-systems, side-effect free functions, but also challenges like non-strict evaluation.
Our results regarding large language models show that there are short-comings when dealing with redundant elements and that such elements can be intentionally searched for. This implies a need for further improvement of the models, to provide more consistent results for trivial changes.
The work centered around Haskell shows the value of utilizing compiler- and language-features to enhance existing techniques: Program repair can be performed with a reduced search space due to compiler-suggested elements, stack-traces and program-coverage can be enhanced by introducing an evaluation-trace and fault-localization is aided by types and expression-level granularity. While the implementation is specific, the approaches remain transferable: Any feature that is used from Haskell in this dissertation, is (or can be) implemented for Java.
In summary, this thesis touches on different topics of assuring software quality and their tools by introducing novel information. This thesis lays groundwork to improve the next generation of development-tools that utilize large language models or statically typed languages.

Searching for Quality: Genetic Algorithms and Metamorphic Testing for Software Engineering ML

Conference paper (2023) - L.H. Applis (author) , Annibale Panichella (author) , R.J. Marang (author)

More machine learning (ML) models are introduced to the field of Software Engineering (SE) and reached a stage of maturity to be considered for real-world use; But the real world is complex, and testing these models lacks often in explainability, feasibility and computational cap ...

HasBugs - Handpicked Haskell Bugs

Conference paper (2023) - L.H. Applis (author) , A. Panichella (author)

We present HasBugs, an extensible and manually-curated dataset of real-world 25 Haskell Bugs from 6 open source repositories. We provide a faulty, tested, and fixed version of each bug in our dataset with reproduction packages, description, and bug context. For technical users, t ...

CSI

Haskell - Tracing Lazy Evaluations in a Functional Language

Conference paper (2023) - M. P. Gissurarson (author) , L.H. Applis (author)

In non-strict languages such as Haskell the execution of individual expressions in a program significantly deviates from the order in which they appear in the source code. This can make it difficult to find bugs related to this deviation, since the evaluation of expressions does ...

PropR: Property-Based Automatic Program Repair

Conference paper (2022) - M. P. Gissurarson (author) , L.H. Applis (author) , Annibale Panichella (author) , Arie van Van Deursen (author) , David Sands (author)

Automatic program repair (APR) regularly faces the challenge of overfitting patches — patches that pass the test suite, but do not actually address the problems when evaluated manually. Currently, overfit detection requires manual inspection or an oracle making quality control of ...

BLEU it All Away!

Refocussing SE ML on the Homo Sapience

Abstract (2022) - L.H. Applis (author)

Many tasks in machine learning for software engineering
rely on prominent NLP metrics, such as the BLEU or
ROUGE score. The metrics are under heavy criticism themselves
within the NLP community, but the SE community adapted them
for lack of better alternatives. Wi ...

Assessing Robustness of ML-Based Program Analysis Tools using Metamorphic Program Transformations

Conference paper (2021) - L.H. Applis (author) , Annibale Panichella (author) , A. van Deursen (author)

Metamorphic testing is a well-established testing technique that has been successfully applied in various domains, including testing deep learning models to assess their robustness against data noise or malicious input. Currently, metamorphic testing approaches for machine learni ...