Safe model-based Reinforcement Learning via Model Predictive Control and Control Barrier Functions

Master Thesis (2025)
Author(s)

K.B. Dzhumageldyev (TU Delft - Mechanical Engineering)

Contributor(s)

A. Dabiri – Mentor (TU Delft - Team Azita Dabiri)

F. Airaldi – Mentor (TU Delft - Team Azita Dabiri)

Faculty
Mechanical Engineering
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
16-10-2025
Awarding Institution
Delft University of Technology
Programme
['Mechanical Engineering | Systems and Control']
Faculty
Mechanical Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Optimal control strategies are often combined with safety certificates to ensure both performance and safety in safety-critical systems. A prominent example is combining Model Predictive Control (MPC) with Control Barrier Functions (CBF). Yet, tuning MPC parameters and choosing an appropriate class kappa function in the CBF is challenging and problem dependent. This thesis introduces a safe model-based Reinforcement Learning (RL) framework where a parameterized MPC incorporates a CBF with a parameterized class kappa function and serves as a function approximator to learn improved safe control policies. Three variations are introduced, distinguished by the way the class kappa function is parameterized. The Learnable Optimal Decay CBF (LOPTD-CBF) extends the Optimal Decay CBF by allowing RL to tune the optimal decay parameters, improving performance while enhancing constraint feasibility and preserving safety guarantees. The Neural Network CBF (NN-CBF) parametrizes the decay term of a discrete exponential CBF with a neural network, enabling richer state-dependent safety conditions. Finally, the Recurrent Neural Network CBF (RNN-CBF) extends the NN-CBF with a recurrent architecture to handle time-varying CBF constraints, such as moving obstacles. Numerical experiments on a discrete double-integrator with static and dynamic obstacles demonstrate that the proposed methods improve performance while ensuring safety, each offering distinct trade-offs in performance, feasibility and complexity.

Files

MScThesisKerim_Final.pdf
(pdf | 9.45 Mb)
License info not available