F. Airaldi | TU Delft Repository

Model-based reinforcement learning for predictive control and optimisation

Doctoral thesis (2026) - F. Airaldi, B. De Schutter, A. Dabiri

In the current age of emerging autonomous and artificial-intelligence-driven machines, sequential decision making constitutes one of the theoretical fundaments at the core of intelligent agency. As these systems are increasingly deployed in real-world engineering applications (e.g., autonomous vehicles and drones, as well as smart energy grids and greenhouses), there is a growing need for the control architectures governing these agents to meet, aside from traditional performance requirements, also interpretability and safety criteria, while encouraging adaptability and scalability. Classical model-based methodologies, such as Model Predictive Control (MPC), can in general provide rigorous frameworks that integrate a priori knowledge (e.g., via explicit, though often approximate, prediction models) and can handle constraints to enforce safety, yet their performance is tightly coupled with the accuracy of the underlying model and expert manual tuning of its parameters. Conversely, purely model-free approaches, such as deep Reinforcement Learning (RL), offer remarkable data-driven adaptation, but often lack interpretability and reliable constraint handling required to provide formal guarantees.
This work expands on the current state-of-the-art results that combine these two distinct approaches into a single framework. While not always straightforward, it is well known that endowing these decision-making processes with model-based knowledge can not only enhance their performance but also benefit their interpretability and analysis: model-based RL, also known as Approximate Dynamic Programming (ADP), is perhaps the most renowned machine learning paradigm to craft these intelligent predictive agents. This dissertation aims to look at RL from a different perspective. Instead of as an alternative to model-based control, RL is used as a performance-enhancing mechanism operating within rigorously defined safety requirements. Concurrently, this thesis establishes MPC as a unifying and scalable foundation block for learning-based control and optimisation for constrained, uncertain, and distributed decision-making systems...... ...

In the current age of emerging autonomous and artificial-intelligence-driven machines, sequential decision making constitutes one of the theoretical fundaments at the core of intelligent agency. As these systems are increasingly deployed in real-world engineering applications (e.g., autonomous vehicles and drones, as well as smart energy grids and greenhouses), there is a growing need for the control architectures governing these agents to meet, aside from traditional performance requirements, also interpretability and safety criteria, while encouraging adaptability and scalability. Classical model-based methodologies, such as Model Predictive Control (MPC), can in general provide rigorous frameworks that integrate a priori knowledge (e.g., via explicit, though often approximate, prediction models) and can handle constraints to enforce safety, yet their performance is tightly coupled with the accuracy of the underlying model and expert manual tuning of its parameters. Conversely, purely model-free approaches, such as deep Reinforcement Learning (RL), offer remarkable data-driven adaptation, but often lack interpretability and reliable constraint handling required to provide formal guarantees.
This work expands on the current state-of-the-art results that combine these two distinct approaches into a single framework. While not always straightforward, it is well known that endowing these decision-making processes with model-based knowledge can not only enhance their performance but also benefit their interpretability and analysis: model-based RL, also known as Approximate Dynamic Programming (ADP), is perhaps the most renowned machine learning paradigm to craft these intelligent predictive agents. This dissertation aims to look at RL from a different perspective. Instead of as an alternative to model-based control, RL is used as a performance-enhancing mechanism operating within rigorously defined safety requirements. Concurrently, this thesis establishes MPC as a unifying and scalable foundation block for learning-based control and optimisation for constrained, uncertain, and distributed decision-making systems......

Validation of a Deep Learning-Assisted Evaluation of Total Corneal Endothelial Cells Viability

Journal article (2025) - Matteo Airaldi, Filippo Airaldi, Vito Romano, Zhuangzhi Gao, Alessandro Ruzza, Mohit Parekh, Diego Ponzin, Stephen Kaye, Francesco Semeraro, Stefano Ferrari, Yalin Zheng

Purpose: To describe the validation of a novel automated analysis of preoperative pan-corneal endothelial cell viability. Methods: Preclinical experimental study. Dead endothelial cells and denuded areas of Descemet membrane of corneoscleral rims were stained with trypan blue (TB) 0.05%. Endothelial mortality was estimated by an experienced eye bank technician ("gold standard") and by deep learning-aided automated segmentation of TB-positive areas (TBPAs) on images of the stained corneas ("V-CHECK method"). V-CHECK mortality was calculated for the whole cornea and for concentric 2-mm rings. The agreement in the estimation of endothelial mortality between the two methods was assessed with Bland-Altman analysis and correlation tests. Results: Nineteen corneas deemed unsuitable for transplantation were used for the experiment. The automated V-CHECK method was able to accurately segment the corneal endothelium and the TBPAs. The gold standard and the V-CHECK method showed a strong positive correlation for all rings (Pearson's ρ, range 0.76-0.81, all P < 0.001). The V-CHECK method resulted in a higher average estimated endothelial mortality (mean difference range +6.5% to +9.5%). Conclusions: The V-CHECK method enables reproducible estimation of endothelial cell viability in donor corneas. Incorporating this technique into the preoperative assessment of donor corneal tissues (in the eye bank and in the operating theater) can provide a reliable evaluation of endothelial health, thereby improving the consistency of tissue quality and further supporting optimal surgical results. Translational Relevance: The V-CHECK deep learning-assisted computer vision protocol will allow surgeons and eye bank technicians to perform an independent, preoperative assessment of global corneal endothelial viability. ...

Probabilistically safe and efficient model-based reinforcement learning

Conference paper (2025) - F. Airaldi, B. De Schutter, A. Dabiri

This paper proposes tackling safety-critical stochastic Reinforcement Learning (RL) tasks with a sample-based, model-based approach. At the core of the method lies a Model Predictive Control (MPC) scheme that acts as function approximation, providing a model-based predictive control policy. To ensure safety, a probabilistic Control Barrier Function (CBF) is integrated into the MPC controller. To approximate the effects of stochasticies in the optimal control formulation and to fulfil the probabilistic CBF condition, a sample-based approach with guarantees is employed. Furthermore, to counterbalance the additional computational burden due to sampling, a learnable terminal cost formulation is included in the MPC objective. An RL algorithm is deployed to learn both the terminal cost and the CBF constraint. Results from a numerical experiment on a constrained LTI problem corroborate the effectiveness of the proposed methodology in reducing computation time while preserving control performance and safety. ...

Reinforcement learning-based model predictive control for greenhouse climate control

Journal article (2025) - Samuel Mallick, Filippo Airaldi, Azita Dabiri, Congcong Sun, Bart De Schutter

Greenhouse climate control is concerned with maximizing performance in terms of crop yield and resource efficiency. One promising approach is model predictive control (MPC), which leverages a model of the system to optimize the control inputs, while enforcing physical constraints. However, prediction models for greenhouse systems are inherently inaccurate due to the complexity of the real system and the uncertainty in predicted weather profiles. For model-based control approaches such as MPC, this can degrade performance and lead to constraint violations. Existing approaches address uncertainty in the prediction model with robust or stochastic MPC methodology; however, these necessarily reduce crop yield due to conservatism and often bear higher computational loads. In contrast, learning-based control approaches, such as reinforcement learning (RL), can handle uncertainty naturally by leveraging data to improve performance. This work proposes an MPC-based RL control framework to optimize the climate control performance in the presence of prediction uncertainty. The approach employs a parametrized MPC scheme that learns directly from data, in an online fashion, the parametrization of the constraints, prediction model, and optimization cost that minimizes constraint violations and maximizes climate control performance. Simulations show that the approach can learn an MPC controller that significantly outperforms the current state-of-the-art in terms of constraint violations and efficient crop growth. ...

Reinforcement Learning With Model Predictive Control for Highway Ramp Metering

Journal article (2025) - Filippo Airaldi, Bart De Schutter, Azita Dabiri

In the backdrop of an increasingly pressing need for effective urban and highway transportation systems, this work explores the synergy between model-based and learning-based strategies to enhance traffic flow management by use of an innovative approach to the problem of ramp metering control that embeds Reinforcement Learning (RL) techniques within the Model Predictive Control (MPC) framework. The control problem is formulated as an RL task by crafting a suitable stage cost function that is representative of the traffic conditions, variability in the control action, and violations of the constraint on the maximum number of vehicles in queue. An MPC-based RL approach, which leverages the MPC optimal problem as a function approximation for the RL algorithm, is proposed to learn to efficiently control an on-ramp and satisfy its constraints despite uncertainties in the system model and variable demands. Simulations are performed on a benchmark small-scale highway network to compare the proposed methodology against other state-of-the-art control approaches. Results show that, starting from an MPC controller that has an imprecise model and is poorly tuned, the proposed methodology is able to effectively learn to improve the control policy such that congestion in the network is reduced and constraints are satisfied, yielding an improved performance that is superior to the other controllers. ...

Multi-agent reinforcement learning via distributed MPC as a function approximator

Journal article (2024) - Samuel Mallick, Filippo Airaldi, Azita Dabiri, Bart De Schutter

This paper presents a novel approach to multi-agent reinforcement learning (RL) for linear systems with convex polytopic constraints. Existing work on RL has demonstrated the use of model predictive control (MPC) as a function approximator for the policy and value functions. The current paper is the first work to extend this idea to the multi-agent setting. We propose the use of a distributed MPC scheme as a function approximator, with a structure allowing for distributed learning and deployment. We then show that Q-learning updates can be performed distributively without introducing nonstationarity, by reconstructing a centralized learning update. The effectiveness of the approach is demonstrated on a numerical example. ...

Learning safety in model-based Reinforcement Learning using MPC and Gaussian Processes

Journal article (2023) - Filippo Airaldi, Bart De Schutter, Azita Dabiri

This paper proposes a method to encourage safety in Model Predictive Control (MPC)-based Reinforcement Learning (RL) via Gaussian Process (GP) regression. The framework consists of 1) a parametric MPC scheme that is employed as model-based controller with approximate knowledge on the real system's dynamics, 2) an episodic RL algorithm tasked with adjusting the MPC parametrization in order to increase its performance, and 3) GP regressors used to estimate, directly from data, constraints on the MPC parameters capable of predicting, up to some probability, whether the parametrization is likely to yield a safe or unsafe policy. These constraints are then enforced onto the RL updates in an effort to enhance the learning method with a probabilistic safety mechanism. Compared to other recent publications combining safe RL with MPC, our method does not require further assumptions on, e.g., the prediction model in order to retain computational tractability. We illustrate the results of our method in a numerical example on the control of a quadrotor drone in a safety-critical environment. ...