Model-Based Safe Reinforcement Learning With Time-Varying Constraints

Applications to Intelligent Vehicles

Journal Article (2024)
Author(s)

Xinglong Zhang (National University of Defense Technology)

Yaoqian Peng (National University of Defense Technology)

Bo Luo (Central South University China)

W. Pan (TU Delft - Robot Dynamics)

Xin Xu (National University of Defense Technology)

Haibin Xie (National University of Defense Technology)

Research Group
Robot Dynamics
DOI related publication
https://doi.org/10.1109/TIE.2023.3317853
More Info
expand_more
Publication Year
2024
Language
English
Research Group
Robot Dynamics
Issue number
10
Volume number
71
Pages (from-to)
12744-12753
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In recent years, safe reinforcement learning (RL) with the actor-critic structure has gained significant interest for continuous control tasks. However, achieving near-optimal control policies with safety and convergence guarantees remains challenging. Moreover, few works have focused on designing RL algorithms that handle time-varying safety constraints. This article proposes a safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. The algorithm's novelty lies in two key aspects. Firstly, the approach introduces a unique barrier force-based control policy structure to ensure control safety during learning. Secondly, a multistep policy evaluation mechanism is employed, enabling the prediction of policy safety risks under time-varying constraints and guiding safe updates. Theoretical results on learning convergence, stability, and robustness are proven. The proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment. It is also applied to the real-world problem of integrated path following and collision avoidance for two intelligent vehicles - a differential-drive vehicle and an Ackermann-drive one. The experimental results demonstrate the impressive sim-to-real transfer capability of our approach, while showcasing satisfactory online control performance.