Safe model-based Reinforcement Learning via Model Predictive Control and Control Barrier Functions

None, None

Safe model-based Reinforcement Learning via Model Predictive Control and Control Barrier Functions

Master Thesis (2025)

Author(s)

K.B. Dzhumageldyev (TU Delft - Mechanical Engineering)

Contributor(s)

A. Dabiri – Mentor (TU Delft - Team Azita Dabiri)

F. Airaldi – Mentor (TU Delft - Team Azita Dabiri)

Faculty

Mechanical Engineering

RL MPC CBF

To reference this document use:

https://resolver.tudelft.nl/uuid:0ff883dc-3ab6-4774-835b-bb4b9801ad7e

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

16-10-2025

Awarding Institution

Delft University of Technology

Programme

['Mechanical Engineering | Systems and Control']

Faculty

Mechanical Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Optimal control strategies are often combined with safety certificates to ensure both performance and safety in safety-critical systems. A prominent example is combining Model Predictive Control (MPC) with Control Barrier Functions (CBF). Yet, tuning MPC parameters and choosing an appropriate class kappa function in the CBF is challenging and problem dependent. This thesis introduces a safe model-based Reinforcement Learning (RL) framework where a parameterized MPC incorporates a CBF with a parameterized class kappa function and serves as a function approximator to learn improved safe control policies. Three variations are introduced, distinguished by the way the class kappa function is parameterized. The Learnable Optimal Decay CBF (LOPTD-CBF) extends the Optimal Decay CBF by allowing RL to tune the optimal decay parameters, improving performance while enhancing constraint feasibility and preserving safety guarantees. The Neural Network CBF (NN-CBF) parametrizes the decay term of a discrete exponential CBF with a neural network, enabling richer state-dependent safety conditions. Finally, the Recurrent Neural Network CBF (RNN-CBF) extends the NN-CBF with a recurrent architecture to handle time-varying CBF constraints, such as moving obstacles. Numerical experiments on a discrete double-integrator with static and dynamic obstacles demonstrate that the proposed methods improve performance while ensuring safety, each offering distinct trade-offs in performance, feasibility and complexity.

Files

MScThesisKerim_Final.pdf

(pdf | 9.45 Mb)

License info not available