M.T.J. Spaan | TU Delft Repository

Using NoisyNet to Improve Exploration in Contextual Bandit Settings

Bachelor thesis (2025) - S. Ruff (author) , Pascal R. van der Vaart (mentor) , N. Yorke-Smith (mentor) , MTJ Spaan (graduation committee member)

Efficient exploration is a major issue in reinforcement learning, particularly in environments with sparse rewards. In these environments, traditional methods like e-greedy fail to efficiently reach an optimal policy. A new method proposed by Fortunato, et al. Fortunato, et al. s ...

A Unified Scaling Law for Bootstrapped DQNs

Bachelor thesis (2025) - R. Knyazhitskiy (author) , Pascal R. van der Vaart (mentor) , N. Yorke-Smith (mentor) , MTJ Spaan (graduation committee member)

We present a large-scale empirical study of Bootstrapped DQN (BDQN) and Randomized-Prior BDQN (RP-BDQN) in the DeepSea environment, aimed at characterizing their scaling properties. Our primary contribution is a unified scaling law that accurately models the probability of reward ...

Extrapolating Learning Curves: When Do Neural Networks Outperform Parametric Models?

Bachelor thesis (2025) - A. Cazacu (author) , Tom Julian Viering (mentor) , C. Yan (mentor) , S. Mukherjee (mentor) , M.T.J. Spaan (graduation committee member)

Learning curve extrapolation helps practitioners predict model performance at larger data scales, enabling better planning for data collection and computational resource allocation. This paper investigates when neural networks outperform parametric models for this task. We conduc ...

Revisiting Langevin Monte Carlo Applied to Deep Q-Learning: An Empirical Study of Robustness and Sensitivity

Bachelor thesis (2025) - P. Hendriks (author) , N. Yorke-Smith (mentor) , Pascal R. van der Vaart (mentor) , MTJ Spaan (graduation committee member)

Deep Reinforcement Learning has achieved superhuman performance in many tasks, such as robotic control or autonomous driving. Algorithms in Deep Reinforcement Learning still suffer from a sample efficiency problem, where, in many cases, millions of samples are needed to achieve g ...

How Noisy Is Too Noisy?

Robust Extrapolation of Learning Curves with LC-PFN

Bachelor thesis (2025) - R.M. Gherasa (author) , C. Yan (mentor) , S. Mukherjee (mentor) , Tom Julian Viering (mentor) , M.T.J. Spaan (graduation committee member)

Accurately predicting a machine learning model’s final performance based on only partial training data can save substantial computational resources and guide early stopping, model selection, and automated machine learning (AutoML) workflows. Learning Curve Prior-Fitted Networks ( ...

The Impact of Imbalanced Training Data on Learning Curve Prior-Fitted Networks

Bachelor thesis (2025) - B. Kostov (author) , C. Yan (mentor) , S. Mukherjee (mentor) , Tom Julian Viering (mentor) , M.T.J. Spaan (graduation committee member)

Learning curves represent the relationship between the amount of training data and the error rate in machine learning. An important use case for learning curves is extrapolating them in order to predict how much data is needed to achieve a certain performance. One way to do such ...

Effectiveness of Machine Learning Models in Classifying Learners Based on Learning Curves

Improving Our Understanding of Learning Curves Through the Process of Classification

Bachelor thesis (2025) - S. Basaran (author) , C. Yan (mentor) , S. Mukherjee (mentor) , Tom Julian Viering (mentor) , M.T.J. Spaan (graduation committee member)

In machine learning, learning curves are a metric that plots performance versus training set size. They inform decisions about data acquisition, model selection, and hyperparameter tuning. Despite their importance, recent research suggests that our understanding of learning curve ...

The Effect of Domain Shift on Learning Curve Extrapolation

Bachelor thesis (2025) - M. Soeters (author) , Tom Julian Viering (mentor) , C. Yan (mentor) , S. Mukherjee (mentor) , M.T.J. Spaan (graduation committee member)

Domain shift is when the distribution of data differs between the training of a model and its testing. This can happen when the conditions of training are slightly different from the conditions that will happen when a model is tested or used. This is a problem for generalizabilit ...

Real-time Adaptive Nonlinear MPC for Collision Imminent Control and Planning in Automated Vehicles

Enforcing constraints and utilizing the full control potential

Master thesis (2024) - K. Trip (author) , M. Mazo (mentor) , M. T.J. Spaan (graduation committee member)

With the introduction of autonomous vehicles on public roads, their performance in emergency situations has become a strong focus. Collision Imminent Control (CIC) concerns the planning and control of aggressive evasive maneuvers for collision avoidance of automated vehicles. CIC ...

Comparing Static Semantics Specifications for the IceDust DSL: A Case Study of Statix

Master thesis (2023) - J. Tilro (author) , D.M. Groenewegen (mentor) , Benedikt Ahrens (graduation committee member) , Matthijs Spaan (graduation committee member)

Reusable tools for engineering software languages can bridge the gap between formal specification and implementation, lowering the bar for engineers to design and implement programming languages. Among such tools belong NaBL2 and its successor Statix, which are meta-languages for ...

Elaine: Elaborations of Higher-Order Effects as First-Class Language Feature

Master thesis (2023) - T. Diepraam (author) , C.R. van der Rest (mentor) , Casper Bach Bach (graduation committee member) , M.T.J. Spaan (graduation committee member)

Algebraic effects and handlers have become a popular abstraction for effectful computation, with implementations even in mainstream programming languages, such as OCaml. The operations of an algebraic effect define the syntax of the effect, while handlers define the semantics. Th ...

Elastic gradient boosting decision trees under limited labels by sequential epistemic uncertainty quantification

Elastic CatBoost Uncertainty (eCBU)

Master thesis (2023) - E.J. Sennema (author) , Anna Lukina (mentor) , Yury Zhauniarovich (mentor) , E. Bárbaro (mentor) , M.T.J. Spaan (graduation committee member) , DMJ Tax (graduation committee member)

Intrusion detection systems (IDSs) are essential for protecting computer systems and networks from malicious attacks. However, IDSs face challenges in dealing with dynamic and imbalanced data, as well as limited label availability. In this thesis, we propose a novel elastic gradi ...

VoBERT: Unstable Log Sequence Anomaly Detection

Introducing Vocabulary-Free BERT

Master thesis (2023) - D. Hofman (author) , Anna Lukina (mentor) , Y. Zhauniarovich (mentor) , E. Bárbaro (mentor) , Matthijs Spaan (graduation committee member) , Sicco Verwer (graduation committee member)

With the ever-increasing digitalisation of society and the explosion of internet-enabled devices with the Internet of Things (IoT), keeping services and devices secure is becoming more important. Logs play a critical role in sustaining system reliability. Manual analysis of logs ...

With the ever-increasing digitalisation of society and the explosion of internet-enabled devices with the Internet of Things (IoT), keeping services and devices secure is becoming more important. Logs play a critical role in sustaining system reliability. Manual analysis of logs has become increasingly difficult, accelerating the development of automated methods for log-anomaly detection. Despite significant progress in automating log analysis, current state-of-the-art methods face challenges dealing with unstable log data, which means that the content of log messages evolves over time.

We show that LogBERT, a state-of-the-art technique based on Bidirectional Encoder Representations from Transformers (BERT), cannot deal with unstable log data. On the three most prevalent publicly available log datasets, Mathew's Correlation Coefficient (MCC) score (which measures the correlation between a model's output and the correct labels) of LogBERT dropped by 90% after increasing log data instability from 1% to over 80% normal sequences containing logkeys in the test set. Log data instability was increased by only reassigning samples between the train and test set. Furthermore, we show that the high performance of LogBERT reported in the original paper was achieved because the model relied on a simple heuristic that only worked under specific conditions.
To address this issue, we propose a novel sequence anomaly detection technique based on BERT: Vocabulary-Free BERT (VoBERT). VoBERT uses a novel pre-training task we designed specifically for anomaly detection: Vocabulary-Free Masked Language Modeling (VF-MLM). We adapted traditional MLM and removed the fixed vocabulary constraint, which allows VF-MLM to classify out-of-vocabulary logkeys correctly.
We highlight that VoBERT is more stable than LogBERT and outperforms the latter in certain situations where log data is very unstable. For the public datasets, the MCC score of the specific train-test split used in the LogBERT paper dropped by 90% after reassigning the train-test split, increasing log data instability. In addition to sequence-level anomaly predictions, we evaluated all approaches on element level, providing a more granular performance assessment.
To assess the generalisation of the experimental results to real-world scenarios, we conducted a case study evaluating the anomaly detection models on real-world security event data collected at a large bank (50,000+ employees). We found that the simple heuristic did not work for this real-world data, having a negative correlation with the correct results. VoBERT showed performance on par with LogBERT on this real-world security event dataset.
We urge future researchers to evaluate their methods on real-world data, as we showed that the commonly used public datasets do not represent real-world scenarios. Furthermore, it is important to assess how difficult it is to detect anomalies in datasets used for evaluation. When a simple heuristic can perform well, such datasets might not be well suited to evaluate a complex anomaly detection model.
This thesis is a proof of concept for the novel pre-training task VF-MLM and paves the way for future work to refine this technique further, as well as to develop additional robust and adaptable solutions for log and security event anomaly detection.

Speeding up program synthesis using specification discovery

Master thesis (2023) - J. de Jong (author) , Sebastijan Dumančić (mentor) , J.G.H. Cockx (mentor) , T.R. Hinnerichs (mentor) , Matthijs T. J. Spaan (graduation committee member)

How convenient would it be to have an AI that relieves us programmers from the burden of coding? Program synthesis is a technique that achieves exactly that: it automatically generates simple programs that meet a given set of examples or adhere to a provided specification. This i ...

Computing the Scanwidth of Directed Acyclic Graphs

Master thesis (2023) - N.A.L. Holtgrefe (author) , Leo Iersel (mentor) , Mark Jones (mentor) , Matthijs Spaan (graduation committee member)

Phylogenetic networks are a specific type of directed acyclic graph (DAG), used to depict evolutionary relationships among, for example, species or other groups of organisms. To solve computationally hard problems, treewidth has been used to parametrize algorithms in phylogenetic ...

Phylogenetic networks are a specific type of directed acyclic graph (DAG), used to depict evolutionary relationships among, for example, species or other groups of organisms. To solve computationally hard problems, treewidth has been used to parametrize algorithms in phylogenetics. In the hope of simplifying the algorithmic design process, Berry, Scornavacca and Weller recently proposed a new measure of tree-likeness that takes into account the directions of the arcs: scanwidth. They showed that the corresponding decision problem of this parameter - which can be seen as a variant of directed cutwidth, using a tree instead of a linear ordering - is NP-complete. This thesis aims to widen the structural knowledge of scanwidth and to find efficient ways of computing it on general DAGs, both by exact and heuristic algorithms.

With the help of reduction rules, we construct an explicit dynamic programming algorithm that computes scanwidth exactly, along with its corresponding tree extension, in O(k * n^k* m) time for rooted DAGs of scanwidth k. This slicewise polynomial algorithm proves that computing the scanwidth is in the complexity class XP. The algorithm also functions as an FPT algorithm for networks of level-l, with the complexity bounded by O(2^4l-1* l * n + n²). It performs well in practice, being able to compute the scanwidth of networks up to 30 reticulations and 100 leaves within 500 seconds.

On the heuristic side, an algorithm that repeatedly splits at a specific type of smallest cut is proposed. Enhanced with simulated annealing, this heuristic shows promising results, obtaining an average approximation ratio of 1.5 for large synthetic networks of 30 reticulations and 100 leaves. Applied to a real-world dataset of networks, the heuristic performs near-optimal. Although we prove that the scanwidth is always greater than or equal to the treewidth, experiments show that they are close to each other in practice. This further motivates the use of scanwidth over treewidth as a parameter in algorithms.

Bootstrapping the statix meta-language

Master thesis (2023) - B.F. Janssen (author) , B.P. Ahrens (mentor) , Matthijs T. J. Spaan (graduation committee member) , Aron Zwaan (graduation committee member) , C.B. Poulsen (graduation committee member)

The Statix meta-language has been developed in order to simplify the definition of static semantics in programming languages. A high-level static semantics definition of a language in Statix can be used to generate a type-checker, hence abstracting over the shared implementation ...

Combining Multi-Objective Planning with Reinforcement Learning to Solve Complex Tasks in Environments with Sparse Rewards

Master thesis (2023) - C. van Rijn (author) , A. Lukina (mentor) , M. T.J. Spaan (graduation committee member) , Frans A. Oliehoek (graduation committee member)

Sequential decision-making problems are problems where the goal is to find a sequence of actions that complete a task in an environment. A particularly difficult type of sequential decision-making problem to solve is one in which the environment has sparse rewards, a large state ...

Hierarchical Clustering Based State Abstraction In Reinforcement Learning

Master thesis (2022) - Y. Liu (author) , Yang Li (mentor) , M.T.J. Spaan (graduation committee member) , Simon H. Tindemans (coach)

Reinforcement learning (RL) has grown tremendously over one and a half decades and is increasingly emerging in many real-life applications. However, the application of RL is still limited due to its low training efficiencies and surplus training cost. The sampling and computation ...

Code Extraction from a Dependently Typed Language to a Stack Based Language

Bachelor thesis (2022) - L.M. Milliken (author) , Jesper Cockx (mentor) , L.F.B. Escot (mentor) , Matthijs T. J. Spaan (graduation committee member)

Dependently typed languages such as Agda can provide users certain guarantees about the correct- ness of the code that they write, however, this comes at the cost of excess code that is not used at run time. Agda code is currently compiled to another language before it is run, th ...

Agda2Rust: A Study on an Alternative Backend for the Agda Compiler

Bachelor thesis (2022) - H. Peeters (author) , J.G.H. Cockx (mentor) , L.F.B. Escot (mentor) , Matthijs T. J. Spaan (graduation committee member)

Agda is a functional programming language with built-in support for dependent types. A dependent type depends on a value. This allows the developer to specify strict constraints for the types used in an application. Writing code with dependent types results in fewer type-related ...