Learning safety in model-based Reinforcement Learning using MPC and Gaussian Processes

None, None; None, None; None, None

Learning safety in model-based Reinforcement Learning using MPC and Gaussian Processes

Journal Article (2023)

Author(s)

F. Airaldi (TU Delft - Team Azita Dabiri)

B. De Schutter (TU Delft - Delft Center for Systems and Control)

A. Dabiri (TU Delft - Team Azita Dabiri)

Research Group

Team Azita Dabiri

DOI related publication

https://doi.org/10.1016/j.ifacol.2023.10.563

Gaussian Processes Learning-based Model Predictive Control Safe Reinforcement Learning

To reference this document use:

https://resolver.tudelft.nl/uuid:1a65f718-dd96-4c65-85ca-66578fc5d248

More Info

expand_more

Publication Year

2023

Language

English

Research Group

Team Azita Dabiri

Issue number

2

Volume number

56

Pages (from-to)

5759-5764

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper proposes a method to encourage safety in Model Predictive Control (MPC)-based Reinforcement Learning (RL) via Gaussian Process (GP) regression. The framework consists of 1) a parametric MPC scheme that is employed as model-based controller with approximate knowledge on the real system's dynamics, 2) an episodic RL algorithm tasked with adjusting the MPC parametrization in order to increase its performance, and 3) GP regressors used to estimate, directly from data, constraints on the MPC parameters capable of predicting, up to some probability, whether the parametrization is likely to yield a safe or unsafe policy. These constraints are then enforced onto the RL updates in an effort to enhance the learning method with a probabilistic safety mechanism. Compared to other recent publications combining safe RL with MPC, our method does not require further assumptions on, e.g., the prediction model in order to retain computational tractability. We illustrate the results of our method in a numerical example on the control of a quadrotor drone in a safety-critical environment.

Files

1-s2.0-S2405896323009308-main.... (pdf)

(pdf | 0.538 Mb)