N. Yorke-Smith | TU Delft Repository

Reactive Scheduling Strategies for Advanced Aircraft Engine Parts Repair

Master thesis (2026) - A. Maniatis, N. Yorke-Smith, Paolo Monti, Zoë Lascaris, A. Bombelli

This thesis presents the development and evaluation of a scheduling framework for aircraft engine part repair tasks within the maintenance department of a major European airline. The study addresses inherent operational uncertainties, data limitations, and shortcomings of existing proactive scheduling practices. The proposed model extends the classic Resource Constrained Project Scheduling Problem (RCPSP) to one that combines multi-skill, maximum lags, and time-window constraints in a repair context. It employs a Tabu Search (TS) model with a custom lookahead decoder to increase feasibility, and integrates novel neighbourhood sampling techniques to handle large-scale operational data. The framework’s performance is assessed using a case study and sensitivity analysis to evaluate deadline adherence, schedule stability, and operational efficiency. Results demonstrate that an average tardiness objective offers the best trade-off between production efficiency and deadline adherence, and that its deadline-driven gradient yields schedules that stay robust as disruptions accumulate through the week. Different candidate lists for sampling based on the critical-path method perform on par with random sampling and show no statistically strong edge over it. Critical-path-based lists, however, rank top-two on every metric except stability and are never worst on any, supporting their use as a robust default. The findings highlight the interplay between a baseline and a repair scheduler and contribute to a more general understanding of scheduling in the repair context. ...

Global-Local Attention vs Graph Neural Networks in the Reinforcement Learning Approach for the Dynamic Berth Allocation Problem

Bridging the Optimality Gap in Dynamic Berth Allocation Problem via Global-Local Attention

Bachelor thesis (2026) - V. Anica-Popa, N. Yorke-Smith, C. March Moya, J.W. Böhmer

The Dynamic Berth Allocation Problem (DBAP) is an optimization problem in maritime logistics that seeks to minimize vessel delays and improve terminal efficiency through effective berth scheduling. This paper investigates how replacing the Graph Neural Network (GNN) encoder in a reinforcement learning approach to the DBAP with a Global-Local Attention mechanism affects the optimality gap. This new encoder architecture is designed to capture both local vessel-berth interactions and global scheduling information. The performance of the proposed architecture is evaluated against the baseline GNN across 2,700 benchmark instances spanning various terminal sizes and traffic congestion levels. Empirical results demonstrate that the Global-Local Attention variant yields a consistent improvement in schedule quality, reducing normalized operational costs by an average of 2.8%. While this modification provides notable optimization gains, particularly in lower congestion scenarios, it introduces an 81.4% increase in trained parameters, presenting a distinct trade-off between scheduling optimality and computational efficiency. ...

Efficient Production Scheduling Under Uncertainty

Combining Constraint Programming, Heuristics, and Monte Carlo Simulations in Industrial Scheduling

Master thesis (2026) - S. Banas, Neil Yorke-Smith, Yuki Murakami

Efficient production scheduling is a critical challenge in modern manufacturing, especially within complex environments characterised by strict operational constraints and inherent uncertainty. This thesis addresses the development and integration of a modular scheduling engine for Occator's OccaSee production management platform, a system previously dependent on manual or external scheduling tools. The challenge lies in balancing the computational complexity of industrial constraints—such as sequence-dependent changeovers, alternative resources, and distinct batch types—with the need for reliable decision support under stochastic disruptions.

To tackle this, a comprehensive scheduling framework was developed. First, a heuristic split-schedule-join paradigm is introduced to handle lot sizing, eliminating combinatorial explosion while preserving order-splitting flexibility. Second, a modular Constraint Programming (CP) model is formulated to optimise batch assignments and sequencing using a multi-step lexicographic approach prioritising lateness, transition costs, and total completion time. To ensure practical industrial viability and scalability, a custom greedy constructive heuristic is integrated as a warm-start mechanism for the CP solver. Finally, operational uncertainty is addressed through a Monte Carlo simulation engine that models historical duration deviations, forecasting schedule robustness and quantifying the value of reactive rescheduling.

Empirical evaluation across diverse static and disrupted instances demonstrates the framework's efficacy. The CP model assisted by the greedy warm-start (CP-WS) guarantees 100% feasibility on highly constrained instances, accelerates early convergence by a factor of 2.3, and reduces lateness and transition costs by approximately 50% compared to the greedy baseline. Furthermore, stochastic experiments reveal that while minor disruptions are best absorbed by existing schedule structures, targeted reactive rescheduling under severe delays recovers an average of over 800 minutes of accumulated lateness. By combining constraint programming, heuristic acceleration, and simulation-based risk analysis, this thesis transforms OccaSee into a dynamic decision support system capable of robust industrial scheduling. ...

Efficient production scheduling is a critical challenge in modern manufacturing, especially within complex environments characterised by strict operational constraints and inherent uncertainty. This thesis addresses the development and integration of a modular scheduling engine for Occator's OccaSee production management platform, a system previously dependent on manual or external scheduling tools. The challenge lies in balancing the computational complexity of industrial constraints—such as sequence-dependent changeovers, alternative resources, and distinct batch types—with the need for reliable decision support under stochastic disruptions.

To tackle this, a comprehensive scheduling framework was developed. First, a heuristic split-schedule-join paradigm is introduced to handle lot sizing, eliminating combinatorial explosion while preserving order-splitting flexibility. Second, a modular Constraint Programming (CP) model is formulated to optimise batch assignments and sequencing using a multi-step lexicographic approach prioritising lateness, transition costs, and total completion time. To ensure practical industrial viability and scalability, a custom greedy constructive heuristic is integrated as a warm-start mechanism for the CP solver. Finally, operational uncertainty is addressed through a Monte Carlo simulation engine that models historical duration deviations, forecasting schedule robustness and quantifying the value of reactive rescheduling.

Empirical evaluation across diverse static and disrupted instances demonstrates the framework's efficacy. The CP model assisted by the greedy warm-start (CP-WS) guarantees 100% feasibility on highly constrained instances, accelerates early convergence by a factor of 2.3, and reduces lateness and transition costs by approximately 50% compared to the greedy baseline. Furthermore, stochastic experiments reveal that while minor disruptions are best absorbed by existing schedule structures, targeted reactive rescheduling under severe delays recovers an average of over 800 minutes of accumulated lateness. By combining constraint programming, heuristic acceleration, and simulation-based risk analysis, this thesis transforms OccaSee into a dynamic decision support system capable of robust industrial scheduling.

Comparing Encoder Architectures for the Discrete Berth Allocation Problem

Testing a pure attention transformer mechanism as an alternative to the GAT encoder a Reinforcement Learning agent uses to solve the Discrete Berth Allocation Problem

Bachelor thesis (2026) - C. Radetchi, Neil Yorke-Smith, Carlos March Moya, J.W. Böhmer

This study looks at whether we can replace the Graph Neural Network (GNN)
encoder in a reinforcement learning framework with a direct attention mechanism to solve the Dynamic Discrete Berth Allocation Problem (DDBAP): the challenge of assigning ships to specific docking spots over time. We tested two alternative designs, a Pure Attention Transformer and a topology-aware Edge-Transformer, against a baseline Graph Attention Network (GAT) using 27 simulated shipping scenarios. On average, the baseline GNN performed relatively better overall, scoring an average cost of 646.58 compared to 650.72 for the Edge-Transformer and 658.63 for the Pure Transformer. However, the best choice depends entirely on the size of the problem. The GNN works best in low-traffic situations with plenty of available berths because it ignores irrelevant background information. In contrast, when the port gets highly congested, such as 120 ships competing for just 5 berths, the global attention mechanism performs much better because it can anticipate long-term queue delays. Finally, while the attention-based models take significantly longer to train over 30,000 samples, they both process decisions in less than a second during live testing. This makes them highly practical for real-time maritime scheduling. ...

Evaluating the Communication Reliability of Earth-Moon Satellite Constellations Using Model Counting

Master thesis (2026) - I. Băcălie, N. Yorke-Smith, A.L.D. Latour, Max Bannach, F.A. Oliehoek

As interest in lunar missions continues to grow, reliable communication between the Earth and the Moon becomes increasingly important. One way to achieve such communication is through satellite constellations, in which satellites communicate through inter-satellite links. However, evaluating the reliability of such constellations is a challenging problem due to continuously changing satellite positions.
In this thesis, we study how the communication reliability of Earth--Moon satellite constellations can be modelled and evaluated. We represent the communication network as a probabilistic graph, in which vertices correspond to satellites and edges correspond to communication links. We consider two reliability metrics: the probability that communication exists between Earth and Moon ground-connected satellites and the probability that the entire network is connected. We also study hop-constrained variants of these metrics, as in quantum communication the number of relay hops is limited.
To evaluate the communication reliability of Earth-Moon satellite constellations, we use projected weighted model counting. Starting from Walker constellations around both the Earth and the Moon, we propagate satellite positions over time. At selected discrete timestamps, we build probabilistic networks to represent the constellation at that point in time. We then encode communication properties as propositional formulas and apply projected weighted model counting to compute reliability. We compare two encodings: one based on reachability, adapted from existing work, and one that explicitly models communication paths.
We evaluate the proposed approach on multiple constellation configurations and timestamps. The experiments provide insights into the factors affecting communication reliability and the computational performance of the two encodings. The results show that projected weighted model counting can be used to evaluate the reliability of dynamic Earth--Moon satellite communication networks, while also highlighting scalability challenges for larger problem instances.
...

As interest in lunar missions continues to grow, reliable communication between the Earth and the Moon becomes increasingly important. One way to achieve such communication is through satellite constellations, in which satellites communicate through inter-satellite links. However, evaluating the reliability of such constellations is a challenging problem due to continuously changing satellite positions.
In this thesis, we study how the communication reliability of Earth--Moon satellite constellations can be modelled and evaluated. We represent the communication network as a probabilistic graph, in which vertices correspond to satellites and edges correspond to communication links. We consider two reliability metrics: the probability that communication exists between Earth and Moon ground-connected satellites and the probability that the entire network is connected. We also study hop-constrained variants of these metrics, as in quantum communication the number of relay hops is limited.
To evaluate the communication reliability of Earth-Moon satellite constellations, we use projected weighted model counting. Starting from Walker constellations around both the Earth and the Moon, we propagate satellite positions over time. At selected discrete timestamps, we build probabilistic networks to represent the constellation at that point in time. We then encode communication properties as propositional formulas and apply projected weighted model counting to compute reliability. We compare two encodings: one based on reachability, adapted from existing work, and one that explicitly models communication paths.
We evaluate the proposed approach on multiple constellation configurations and timestamps. The experiments provide insights into the factors affecting communication reliability and the computational performance of the two encodings. The results show that projected weighted model counting can be used to evaluate the reliability of dynamic Earth--Moon satellite communication networks, while also highlighting scalability challenges for larger problem instances.

The Robustness of the GNN approach in application of different data-generation method

How robust is the GNN approach when applied to instances generated using different data-generation methods or distributions?

Bachelor thesis (2026) - J. Lemut, Carlos March Moya, Neil Yorke-Smith

Sea cargo transportation relies on the efficiency of ports and how effectively ships are allocated to berths. This study evaluates the robustness of the Graph Neural Network proposed by Moya et. al. [7] for the Berth Allocation Problem. It uses real world scenarios and known cases to scrutinize the agent in comparison to common heuristics and a baseline optimal solution. While the agent created schedules that were in general on par with the baseline heuristics, the agent performed particularly poorly in changes to the handling times variable. ...

Reinforcement Learning for the Discrete Dynamic Berth Allocation Problem

Evaluating a trained Discrete Dynamic Berth Allocation model on Berth breakdowns

Bachelor thesis (2026) - T. Kuklys, C. March Moya, N. Yorke-Smith

The problem of scheduling vessels in a port as they arrive one-by-one is known as the Dynamic Berth Allocation Problem and it is NP-hard. This paper analyses the influence of berth breakdowns on the scheduling and optimality of a trained Reinforcement Learning model by March Moya et al. for such a problem. Several breakdown parameters, including frequency, severity, dura- tion, and the probability of binary versus partial breakdowns, were examined independently and in combination with one another with respect to different scheduling heuristics. The breakdowns were dynamically injected into the model’s event loop so that it did not have knowledge of upcoming breakdowns. Each experimental configuration was evaluated using ten random seeds and the sample mean and standard deviation were computed. The results showed low variance between different seeds and configurations. Breakdown frequency was the main factor limiting the model’s perfor- mance, moving the performance from a 19.6% advantage to 6.4% in the most extreme cases when compared to the baseline heuristic of WTSP. The other parameters did not produce significant model degradation. ...

Beyond Static Parameters: Adaptive Simulated Annealing in Practice

A case study on environmentally-aware lockage scheduling for the North Sea Locks

Master thesis (2026) - J. Heijne, Neil Yorke-Smith, Paweł Kołodziejczyk, Vasso Reppa

The North Sea Locks connect the saline Western Scheldt to the Terneuzen-Gent canal, requiring efficient scheduling of vessels. While current scheduling models minimise vessel delays, lockages also cause freshwater loss and salt intrusion into the canal, negatively affecting drinking water, agriculture, and the surrounding ecosystem. This thesis extends an existing simulated annealing scheduling model with water loss and salt intrusion objectives, and evaluates adaptive hyperparameter mechanisms to reduce the burden of empirical tuning. The results confirm a direct tradeoff between the environmental impact and vessel delays; incorporating environmental objectives yields significant reductions in freshwater loss and salt intrusion at the cost of increased vessel delays. Among the adaptive approaches, those guided by domain-specific information, such as tidal conditions and salinity levels, show performance improvements. General approaches from the literature, specifically the Modified Lam Annealing schedule and memory-based neighbourhood selection, do not transfer effectively to the North Sea Lock scheduling problem. These findings suggest that for domain-specific optimisation problems, problem-specific adaptive mechanisms outperform general-purpose ones, and that meaningful environmental gains in lock scheduling are achievable through optimisation alone. ...

Anomaly Detection in Geostationary Satellites

Unsupervised, Satellite-Agnostic Error Detection & Localisation

Master thesis (2026) - M.C. Bak, N. Yorke-Smith, J.G. De Teixeira da Encarnacao, Erwin Platen, Sytze Andringa

As Eugene Wigner showed in 1939 for the Poincaré algebra, fundamental particles can be classified using symmetry algebras. In a universe including gravity, the Poincaré algebra cannot be the correct symmetry algebra, as this is the symmetry algebra for flat space. Instead, one should consider a symmetry algebra of asymptotic symmetries, which preserve only the asymptotic structure of gravity. Many of these asymptotic symmetries are not physically useful and are therefore considered “trivial.” In this thesis, we give a new quantum definition of trivial symmetries, namely that a symmetry is trivial if it does not change which fundamental particles are found in a classification. We then specialize to three-dimensional asymptotically Anti-de-Sitter space. To calculate which symmetries are trivial, we first determine the second cohomology group of the asymptotic symmetries. Using the second cohomology group, it is then found that the useful asymptotic symmetry algebra is given by w⊕w⊕Rw \oplus w \oplus \mathbb{R}w⊕w⊕R, where www is the Witt algebra (centerless Virasoro) algebra, whereas the standard definition of trivial symmetry gives w⊕ww \oplus ww⊕w as the useful symmetry algebra. The extra factor of R\mathbb{R}R is interpreted as a kind of center-of-mass momentum. ...

Improving Inland and Short-Sea Vessel Scheduling using Constraint Optimization

A Google OR-Tools Implementation for a Container Vessel Planning System

Master thesis (2026) - D. Chou Rainho, N. Yorke-Smith, B. Atasoy, Quirijn Schevenhoven

Inland and short-sea container shipping in Northwestern Europe relies on manual planning by experienced logistics operators. This process, while effective for routine operations, is time-consuming, difficult to scale, and limited in its ability to globally optimize fleet utilization. The underlying scheduling problem is classified as a Heterogeneous Vehicle Routing Problem with Pickup and Delivery and Time Windows (VRPPDTW), incorporating domain-specific constraints such as minimum call sizes, forbidden terminals, terminal opening hours, and mandatory vessel breaks.

This thesis presents Orion, a constraint-based optimization solver built on the Google OR-Tools Routing Library, designed to automate and improve this vessel scheduling process. The problem is formulated using Constraint Programming, which allows complex business rules to be expressed as logical predicates rather than linearized inequalities. The solver pipeline includes a data preprocessing stage that aggregates individual container orders into compound orders, reducing problem size by approximately 90\% while preserving solution quality.

The evaluation follows a three-phase methodology. First, a feasibility analysis filters 14 construction heuristics down to two viable candidates: Local Cheapest Insertion and Parallel Cheapest Insertion. Second, a parameter tuning phase identifies Local Cheapest Insertion combined with Tabu Search as the best-performing configuration, achieving an Average Relative Percentage Deviation of 2.72\% across all scenarios. Third, a benchmarking phase compares Orion against human planners and the company's existing Simulated Annealing solver (Baseline SA) on six real-world scenarios from three distinct logistics operators.

The results show that Orion achieves travel cost reductions of 6.7\% to 32.8\% compared to human planners on five of six scenarios and outperforms Baseline SA's average result on five of six scenarios. A key structural advantage is determinism: Orion produces identical results across repeated runs, whereas Baseline SA exhibits variance of up to 37.5\% between its best and worst runs. Convergence analysis indicates that 96--99.97\% of objective improvement occurs within the first 30 seconds, making a 5-minute time budget sufficient for operational use.

The thesis concludes with recommendations for production deployment and identifies future research directions, including dynamic water level constraints, improved rolling-horizon replanning, custom fleet reduction operators, and hub-based consolidation strategies.
...

Inland and short-sea container shipping in Northwestern Europe relies on manual planning by experienced logistics operators. This process, while effective for routine operations, is time-consuming, difficult to scale, and limited in its ability to globally optimize fleet utilization. The underlying scheduling problem is classified as a Heterogeneous Vehicle Routing Problem with Pickup and Delivery and Time Windows (VRPPDTW), incorporating domain-specific constraints such as minimum call sizes, forbidden terminals, terminal opening hours, and mandatory vessel breaks.

This thesis presents Orion, a constraint-based optimization solver built on the Google OR-Tools Routing Library, designed to automate and improve this vessel scheduling process. The problem is formulated using Constraint Programming, which allows complex business rules to be expressed as logical predicates rather than linearized inequalities. The solver pipeline includes a data preprocessing stage that aggregates individual container orders into compound orders, reducing problem size by approximately 90\% while preserving solution quality.

The evaluation follows a three-phase methodology. First, a feasibility analysis filters 14 construction heuristics down to two viable candidates: Local Cheapest Insertion and Parallel Cheapest Insertion. Second, a parameter tuning phase identifies Local Cheapest Insertion combined with Tabu Search as the best-performing configuration, achieving an Average Relative Percentage Deviation of 2.72\% across all scenarios. Third, a benchmarking phase compares Orion against human planners and the company's existing Simulated Annealing solver (Baseline SA) on six real-world scenarios from three distinct logistics operators.

The results show that Orion achieves travel cost reductions of 6.7\% to 32.8\% compared to human planners on five of six scenarios and outperforms Baseline SA's average result on five of six scenarios. A key structural advantage is determinism: Orion produces identical results across repeated runs, whereas Baseline SA exhibits variance of up to 37.5\% between its best and worst runs. Convergence analysis indicates that 96--99.97\% of objective improvement occurs within the first 30 seconds, making a 5-minute time budget sufficient for operational use.

The thesis concludes with recommendations for production deployment and identifies future research directions, including dynamic water level constraints, improved rolling-horizon replanning, custom fleet reduction operators, and hub-based consolidation strategies.

Supply Chain Life Cycle Optimization under Uncertainty

Master thesis (2026) - P.G. van Mastrigt, Selin Hülagü, Reinout Heijungs, N. Yorke-Smith, A.A.J.F. van den Dobbelsteen, M.A. Sharifi Kolarijani

For most organizations, the majority of greenhouse gas emissions are Scope 3 emissions embedded in geographically dispersed supply chains. In such settings, environmental and economic impacts, as well as operating conditions, are uncertain, and decisions are sequential, meaning that early commitments constrain later operational choices. Yet current optimization practices do not integrate a full life cycle perspective in such conditions. In response, this thesis develops a stochastic Closed-Loop Supply Chain Life Cycle Optimization (SCLCO) framework that embeds the algebraic, matrix-based structure of life cycle assessment directly within a multistage stochastic mixed-integer program. Rather than assessing impacts ex post, the model minimizes expected environmental and economic impacts through sequential decisions while representing uncertainty in supplier-dependent life cycle inventory data and economic drivers, as well as stochastic demand and return processes. The framework is illustrated using a university campus circular furniture procurement case study. Applying the stochsatic SCLCO to a framework agreement yields decision-relevant insights for supplier selection under joint environmental–economic objectives. The resulting problem is solved using Sample Average Approximation. Trade-offs are explored via weighted objectives and summarized using a Pareto frontier; robustness is assessed through optimality-gap analysis and value-of-information metrics; and outcomes are examined using contribution, uncertainty, and sensitivity analyses. Results indicate that economic outcomes are driven primarily by demand uncertainty, whereas environmental outcomes are dominated by uncertainty in manufacturing-stage inventory data. Accordingly, this work offers a decision-support framework for decarbonization efforts throughout supply chains. ...

Bayesian Model-Free Deep Reinforcement Learning

Doctoral thesis (2026) - P.R. van der Vaart, M.T.J. Spaan, N. Yorke-Smith

The goal of reinforcement learning is to train agents to perform tasks under little supervision. Tasks are specified by a reward function and transition function, which state how much reward the agent gets for its action in a state, and how the environment state changes based on the action the agent took. Typically reinforcement learning assumes no prior knowledge over the reward and transition function,meaning that agents need to explore the environment and learn essentially through trial and error. Model-free methods attempt to learn which actions lead to good outcomes without modeling the reward or environments itself. Efficiently selecting that actions are promising is an active research direction which can greatly reduce the number of total interactions needed for an agent to learn the task, potentially opening the door to new applications where trials or simulations are expensive or compute is limited.
Uncertainty quantification is a central mechanism in such efficient exploration methods. Provided with an estimate of how certain the agent is about the outcome of an action, it can intelligently weigh whether it is worth exploring. The Bayesian paradigm is one method to quantify uncertainty in machine learning. It models the uncertainty with a probability distributions over models, specifying how likely a model is based on the data the agent has collected.
We adopt a Bayesian point of view in model-free reinforcement learning, and develop a deeper understanding on when Bayesian reinforcement learning methods can be expected to work well and challenges that remain. To this end, in Chapter 2 we propose training ensembles through Sequential Monte Carlo, obtaining a sample from the posterior distribution of a deep Q-learning agent. We observe that agents are able to perform directed exploration, although not necessarily more efficiently than standard ensembles in every environment. Furthermore, in Chapter 3 we theoretically analyze existing Bayesian Deep model-Free Reinforcement Learning methods, and unify them into a single theoretical framework we call Epistemic Bellman Operators. We prove that these operators are contractions, establishing convergence of derived algorithms in a simplified setting. Finally, in Chapter 4 we analyze the likelihood and prior assumptions
in existing Bayesian deep model-free reinforcement learning methods, and find through statistical tests that the standard likelihood assumptions are violated on every benchmark we tested. We also find that we can improve performance of Bayesian model free reinforcement learning methods by picking different priors based on empirical data from unrelated tasks, which transfer to new environments.
This dissertation establishes several desirable properties of Bayesian Deep model free reinforcement learning, but also raises some key issues, most notably misspecification in Chapter 4. We hope our findings convince other Bayesian reinforcement learning researchers to give more attention to assumptions about priors and likelihoods. ...

The goal of reinforcement learning is to train agents to perform tasks under little supervision. Tasks are specified by a reward function and transition function, which state how much reward the agent gets for its action in a state, and how the environment state changes based on the action the agent took. Typically reinforcement learning assumes no prior knowledge over the reward and transition function,meaning that agents need to explore the environment and learn essentially through trial and error. Model-free methods attempt to learn which actions lead to good outcomes without modeling the reward or environments itself. Efficiently selecting that actions are promising is an active research direction which can greatly reduce the number of total interactions needed for an agent to learn the task, potentially opening the door to new applications where trials or simulations are expensive or compute is limited.
Uncertainty quantification is a central mechanism in such efficient exploration methods. Provided with an estimate of how certain the agent is about the outcome of an action, it can intelligently weigh whether it is worth exploring. The Bayesian paradigm is one method to quantify uncertainty in machine learning. It models the uncertainty with a probability distributions over models, specifying how likely a model is based on the data the agent has collected.
We adopt a Bayesian point of view in model-free reinforcement learning, and develop a deeper understanding on when Bayesian reinforcement learning methods can be expected to work well and challenges that remain. To this end, in Chapter 2 we propose training ensembles through Sequential Monte Carlo, obtaining a sample from the posterior distribution of a deep Q-learning agent. We observe that agents are able to perform directed exploration, although not necessarily more efficiently than standard ensembles in every environment. Furthermore, in Chapter 3 we theoretically analyze existing Bayesian Deep model-Free Reinforcement Learning methods, and unify them into a single theoretical framework we call Epistemic Bellman Operators. We prove that these operators are contractions, establishing convergence of derived algorithms in a simplified setting. Finally, in Chapter 4 we analyze the likelihood and prior assumptions
in existing Bayesian deep model-free reinforcement learning methods, and find through statistical tests that the standard likelihood assumptions are violated on every benchmark we tested. We also find that we can improve performance of Bayesian model free reinforcement learning methods by picking different priors based on empirical data from unrelated tasks, which transfer to new environments.
This dissertation establishes several desirable properties of Bayesian Deep model free reinforcement learning, but also raises some key issues, most notably misspecification in Chapter 4. We hope our findings convince other Bayesian reinforcement learning researchers to give more attention to assumptions about priors and likelihoods.

Data-Driven and Robust Predictive Control and Optimization

With Applications to Building Energy Management

Doctoral thesis (2026) - Y. Li, T. Keviczky, N. Yorke-Smith

Buildings, as major global energy consumers, can help mitigate the impact of growing renewable energy in smart grids through demand-side management (DSM). Smart energy management of buildings requires advanced control schemes that can cope with economic objectives, environmental uncertainties, occupant comfort, physical constraints, and external communication signals. Robust optimization (RO) and model predictive control (MPC) provide systematic and effective frameworks for this purpose. Beyond building energy management, RO and MPC methods are also fundamental to a broad range of engineering applications, such as chemical process planning, transportation, robotics, etc. This thesis focuses on RO and MPC design for linear systems as well as their applications in building energy management. The research is organized into three topics:

• MPC designs for building energy management to enable energy-flexible DSM and improve environmental sustainability (Chapters 2–4).
• data-driven RO designs and algorithmic solutions for linear models to reduce conservatism and improve computational efficiency (Chapters 5 & 6).
• a distributionally robust MPC design for constrained linear systems to robustify control performance against additive model uncertainties and disturbances (Chapter 7). ...

Optimising Discrete Problems

Decision Diagrams and Context-Aware Heuristics

Doctoral thesis (2026) - E. Eigbe, N. Yorke-Smith, B. De Schutter, M. Nasri Nasrabadi

Optimisation problems are all around us and play a critical role in the outcomes of various sectors of society including scheduling, logistics, network design, and resource allocation. In this thesis, we look at a subset of problems where some or all the choices to be made can only take on values belonging to a discrete set of values – a limitation which increases the difficulty of finding a solution in most cases. We handle this thesis in two parts: first zooming in on a particular class of problems – scheduling – and then on a particular solution method – decision diagrams.

In Part I of this dissertation, we consider the problem of scheduling in manufacturing systems. This popular discrete optimisation problem has been studied extensively; however, advancements in modern manufacturing systems present new challenges and opportunities. A major driver of the changes in manufacturing systems is the integration of the physical processes with computation, networking, and automated control capabilities. Thus, the domain of activities that can directly be decided upon and actuated from software has expanded. We look at one such activity in Chapter 3 namely,
sequence-dependent maintenance and propose solution methods that directly integrate maintenance and production planning.

Part II of this thesis looks into decision diagrams as a solution method for discrete optimisation problems. Decision diagrams have existed since the 1950s and were originally introduced as a means to represent boolean functions. Since then they have been used in different fields such as in circuit verification, knowledge representation, and most recently, operations research. While decision diagrams have already been shown to be very promising methods for solving optimisation problems, we push the field forward as follows. In Chapter 5, we propose a decision diagram model for the kind of scheduling problems tackled in Part I of this thesis. We further perform a comparative study of the consequences of heuristic decisions made during the decision diagram compilation process in Chapter 6 and integrate reinforcement learning with decision-diagram-based branch-and-bound in Chapter 7. In Chapter 8 we go further into considering uncertainty in optimisation problems. We focus on the paradigm of finding the best solution while accepting some level of risk, i.e., chance constrained optimisation and present a chance constrained decision diagram formulation. We further provide theoretical guarantees for instances with normally distributed variables.

The contributions of this thesis cover different aspects of solving discrete optimisation problems with a focus on scheduling and logistic applications. The hope is that these advances further the adoption of state of the art research in solving optimisation problems in real-world contexts. ...

Optimisation problems are all around us and play a critical role in the outcomes of various sectors of society including scheduling, logistics, network design, and resource allocation. In this thesis, we look at a subset of problems where some or all the choices to be made can only take on values belonging to a discrete set of values – a limitation which increases the difficulty of finding a solution in most cases. We handle this thesis in two parts: first zooming in on a particular class of problems – scheduling – and then on a particular solution method – decision diagrams.

In Part I of this dissertation, we consider the problem of scheduling in manufacturing systems. This popular discrete optimisation problem has been studied extensively; however, advancements in modern manufacturing systems present new challenges and opportunities. A major driver of the changes in manufacturing systems is the integration of the physical processes with computation, networking, and automated control capabilities. Thus, the domain of activities that can directly be decided upon and actuated from software has expanded. We look at one such activity in Chapter 3 namely,
sequence-dependent maintenance and propose solution methods that directly integrate maintenance and production planning.

Part II of this thesis looks into decision diagrams as a solution method for discrete optimisation problems. Decision diagrams have existed since the 1950s and were originally introduced as a means to represent boolean functions. Since then they have been used in different fields such as in circuit verification, knowledge representation, and most recently, operations research. While decision diagrams have already been shown to be very promising methods for solving optimisation problems, we push the field forward as follows. In Chapter 5, we propose a decision diagram model for the kind of scheduling problems tackled in Part I of this thesis. We further perform a comparative study of the consequences of heuristic decisions made during the decision diagram compilation process in Chapter 6 and integrate reinforcement learning with decision-diagram-based branch-and-bound in Chapter 7. In Chapter 8 we go further into considering uncertainty in optimisation problems. We focus on the paradigm of finding the best solution while accepting some level of risk, i.e., chance constrained optimisation and present a chance constrained decision diagram formulation. We further provide theoretical guarantees for instances with normally distributed variables.

The contributions of this thesis cover different aspects of solving discrete optimisation problems with a focus on scheduling and logistic applications. The hope is that these advances further the adoption of state of the art research in solving optimisation problems in real-world contexts.

Reusable Programming Language Components

Doctoral thesis (2026) - C.R. van der Rest, A. van Deursen, N. Yorke-Smith, C. Bach

Type systems are a tool for preventing software errors, by classifying (sub)terms according to how they are evaluated. This way, mistakes can be caught at compile-time, ruling out the existence of entire classes of mistakes altogether. Using a programming language with a strong type system to develop critical software can dramatically reduce the prevalence and impact of bugs.
In light of the potentially enormous impact of bugs, it is important that we can trust a type system to be succesful in preventing errors. A key property of type systems that reflects this criterium is type soundness, which establishes that “well-typed programs cannot go wrong”. That is, programs that are deemed safe by the type system should not exhibit certain wrong behaviour when executed. To gain trust in a type system’s ability to prevent errors, we can give a formal system of both a language’s type system and semantics, and mathematically verify that the type system is sound with respect to the defined semantics. While this provides airtight evidence that a type system is succesful in ruling out certain mistakes, the formal specification and verification of programming languages requires a formidable amount of time and expertise on behalf of the language designer, and is therefore infeasible in most cases.... ...

Deep Reinforcement Learning for Multi-Objective Airport Ground Handling

Master thesis (2025) - S.M. Shah, N. Yorke-Smith, Yaoxin Wu, J.W. Böhmer

Air transport has enormous impact on economic, social, and environmental factors worldwide. According to the International Air Transport Association significant year on year increases can be noticed recently, in both passenger and cargo traffic. However, with this increasing demand for air transport, airports are faced with a rising trend in congestion, delays, and other problematic inefficiencies. Although increasing demand is a factor in these challenges, airline or airport causes, such as ground handling, remain the second largest cause of delays. This highlights the need for efficient and optimal scheduling of ground handling processes to minimise delays, and thereby the negative impact this might have on the airlines or airports. Most existing studies address only simplified sub-problems of Airport Ground Handling (AGH), by relaxing constraints or by leaving them out, rather than tackling the complete problem. Furthermore, these approaches tend to focus on single objective optimisation of AGH, while in practice there can be many (conflicting) objectives that need to be optimised simultaneously. This research explores the possibility of extending a neural model, which is trained to optimise instances of AGH on a single objective, with a generic learning-based approach that approximates the Pareto set for multi-objective optimisation, and applying further hypothetical improvements to this combined model. The implemented models are compared against each other, heuristic approaches for multi-objective combinatorial optimisation, and against the original single-objective neural model. ...

Enhancing Issue Tracking Efficiency with AI-Driven Natural Language Processing: Improving Classification, Association and Resolution

Master thesis (2025) - V.A. Pocheva, N. Yorke-Smith, M. Izadi, René van den Berg, M.A. Costea, D. Spinellis

In large-scale engineering environments, efficient issue tracking is essential for timely problem resolution and knowledge reuse. However, manual classification and association of issue reports present scalability challenges, further complicated by inconsistent annotations and the absence of semantic linking mechanisms. This project investigates the application of Natural Language Processing and Artificial Intelligence to automate multi-label classification and discover meaningful semantic associations between technical issues. Over 70 model configurations were evaluated on a real-world industrial dataset, comparing classical models with transformer-based and deep learning approaches. DistilBERT achieved the highest Recall@5 (0.93), indicating strong performance in identifying relevant categories. Classical methods, such as TF-IDF combined with Logistic Regression, also performed well, offering a computationally efficient and interpretable option. For association discovery, approaches including lexical retrieval, embedding-based similarity, clustering-based filtering, and topic modelling were assessed using both quantitative metrics and expert review. Lexical (BM25) and embedding-based (SBERT + Cosine Similarity) methods offer complementary strengths, retrieving overlapping yet distinct sets of associations. Associations identified by both models were rated as useful in over 70% of cases by domain experts, suggesting that agreement between methods may serve as an indicator of relevance. While Copilot provided consistent relevance assessments, its ratings were often higher than those provided by human evaluators and did not always reflect their detailed assessments. These findings highlight the potential of combining lexical and semantic methods with human-in-the-loop validation to support scalable and accurate industrial applicability. ...

Uncertainty Based Exploration in Reinforcement Learning

Analyzing the Robustness of Bayesian Deep Q-Networks

Bachelor thesis (2025) - S. Schwartz, N. Yorke-Smith, P.R. van der Vaart, M.T.J. Spaan

Bayesian Deep Q-Networks (BDQN) have demonstrated superior exploration capabilities and performance in complex environments such as Atari games, yet their behavior in other simpler settings and their sensitivity to hyperparameters remain understudied. This work evaluates BDQN in both contextual bandit and reinforcement learning tasks, compares it against the standard ϵ-greedy exploration strategy and analyzes its hyperparameter sensitivity. Our results indicate that BDQN outperforms ϵ-greedy DQN in exploration-heavy environments, particularly Deep Sea with sparse rewards, but performs comparably in simpler tasks where exploration is less critical. Sensitivity analysis reveals that the forgetting factor (α) plays a central role in modulating
exploration, while other hyperparameters such as batch size also impact performance to varying degrees. These findings suggest BDQN is a promising strategy for complex tasks requiring persistent exploration, though it introduces additional tuning complexity. ...

Heuristic Optimization of Amazon Redshift Table Configurations

Focusing on Distribution Style, Sort Keys and Column Encodings in Amazon Redshift

Master thesis (2025) - X.L. Hu, N. Yorke-Smith, A. Katsifodimos, C. Lofi, Derek van den Broek

This thesis presents a comprehensive, heuristic cost-driven framework for optimizing database table configuration in Amazon Redshift focusing on distribution styles, sort keys and column encodings. Unlike existing approaches that treat optimization parameters independently, this research develops a sequential optimization methodology that captures complex interdependencies between configuration choices and their performance impacts across different data scales.

The study addresses four research questions examining individual parameter optimization strategies and their integrated effects on system performance. The experimental evaluation employs two datasets: a primary table containing 300 million data records across 23 columns where optimization is performed, and a secondary join table with 117.5 million data records across 12 columns that remains unchanged. Scale-dependent analysis is conducted using subsets of 10 million and 100 million data records selected from the primary dataset to enable controlled comparison across different data volumes.

Key findings demonstrate that table configuration optimization in Amazon Redshift exhibits pronounced scale-dependent performance characteristics across the experimental datasets, with three distinct performance regimes identified: a small-scale regime (10M data records) characterized by query-type dependent optimization effectiveness, a medium-scale regime (100M data records) showing optimization trade-off transitions, and a large-scale regime (300M data records) dominated by I/O and storage optimizations. The research reveals mixed optimization outcomes, with performance improvements ranging from 21 percent CPU reduction at small scales to 62 percent I/O improvement at large scales, while demonstrating that optimization strategies effective at one scale can become counterproductive at another. Overall, the framework shows variable success in parameter selection for distribution style, sort key and encoding selection.

The research identifies fundamental challenges in optimizing Amazon Redshift table configurations where internal algorithms remain opaque and optimization benefits exhibit non-linear scaling patterns across the tested different data volumes. While the framework provides valuable insights into scale-dependent optimization patterns, the mixed results highlight the complexity of achieving consistent performance improvements across different scales and query types. These findings challenge assumptions about uniform optimization benefits and emphasize the need for empirical validation approaches in cloud database optimization, providing practical insights for database administrators and theoretical foundations for developing adaptive optimization systems. ...

This thesis presents a comprehensive, heuristic cost-driven framework for optimizing database table configuration in Amazon Redshift focusing on distribution styles, sort keys and column encodings. Unlike existing approaches that treat optimization parameters independently, this research develops a sequential optimization methodology that captures complex interdependencies between configuration choices and their performance impacts across different data scales.

The study addresses four research questions examining individual parameter optimization strategies and their integrated effects on system performance. The experimental evaluation employs two datasets: a primary table containing 300 million data records across 23 columns where optimization is performed, and a secondary join table with 117.5 million data records across 12 columns that remains unchanged. Scale-dependent analysis is conducted using subsets of 10 million and 100 million data records selected from the primary dataset to enable controlled comparison across different data volumes.

Key findings demonstrate that table configuration optimization in Amazon Redshift exhibits pronounced scale-dependent performance characteristics across the experimental datasets, with three distinct performance regimes identified: a small-scale regime (10M data records) characterized by query-type dependent optimization effectiveness, a medium-scale regime (100M data records) showing optimization trade-off transitions, and a large-scale regime (300M data records) dominated by I/O and storage optimizations. The research reveals mixed optimization outcomes, with performance improvements ranging from 21 percent CPU reduction at small scales to 62 percent I/O improvement at large scales, while demonstrating that optimization strategies effective at one scale can become counterproductive at another. Overall, the framework shows variable success in parameter selection for distribution style, sort key and encoding selection.

The research identifies fundamental challenges in optimizing Amazon Redshift table configurations where internal algorithms remain opaque and optimization benefits exhibit non-linear scaling patterns across the tested different data volumes. While the framework provides valuable insights into scale-dependent optimization patterns, the mixed results highlight the complexity of achieving consistent performance improvements across different scales and query types. These findings challenge assumptions about uniform optimization benefits and emphasize the need for empirical validation approaches in cloud database optimization, providing practical insights for database administrators and theoretical foundations for developing adaptive optimization systems.

Algorithms for the Sharing Economy

An Economic Modelling Perspective on Distributed Exchange

Master thesis (2025) - R. Hellinga, N. Yorke-Smith, G. Iosifidis, E.A. Aivaloglou

This thesis investigates the gap between the theoretical ideals and practical realities of distributed exchange protocols in peer-to-peer sharing economies. While classical economic models and stylised trading mechanisms have been extensively studied, their assumptions often overlook the complexities and behavioural nuances of real-world agent-based markets. Conversely, more empirically grounded approaches tend to be highly case-specific, limiting their generalisability and disconnecting them from established economic concepts.
The central aim of this work is to develop and analyse an algorithmic economic model that more faithfully captures the dynamics of decentralised resource sharing — balancing fairness, efficiency, and resistance to manipulation, while retaining a useful level of abstraction.
In light of this purpose, the thesis performs a comparative theoretical analysis of centralised and distributed exchange mechanisms, and introduces a novel exchange market model that incorporates heterogeneous strategies, bounded rationality, and asynchronous timing. A series of numerical experiments evaluates the performance and robustness of different protocols under these features of sharing economies.
The results show that distributed, mixed-strategy protocols can achieve stable and desirable outcomes, however their success is sensitive to population diversity, limited information, and strategic behaviour. These findings highlight the importance of integrating behavioural aspects into protocol design, and provide insights towards building more robust models for the sharing economy. ...