Model-Based Safe Reinforcement Learning With Time-Varying Constraints

None, None; None, None; None, None; None, None; None, None; None, None

Model-Based Safe Reinforcement Learning With Time-Varying Constraints

Applications to Intelligent Vehicles

Journal Article (2024)

Author(s)

Xinglong Zhang (National University of Defense Technology)

Yaoqian Peng (National University of Defense Technology)

Bo Luo (Central South University China)

W. Pan (TU Delft - Robot Dynamics)

Xin Xu (National University of Defense Technology)

Haibin Xie (National University of Defense Technology)

Research Group

Robot Dynamics

DOI related publication

https://doi.org/10.1109/TIE.2023.3317853

Safety Convergence Reinforcement learning Heuristic algorithms Vehicle dynamics Optimal control Time-varying systems Barrier force Multistep policy evaluation Safe reinforcement learning (RL) Time-varying constraints

To reference this document use:

https://resolver.tudelft.nl/uuid:7a5f0f40-919e-4533-936f-10d5e7e9689b

More Info

expand_more

Publication Year

2024

Language

English

Research Group

Robot Dynamics

Issue number

10

Volume number

71

Pages (from-to)

12744-12753

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In recent years, safe reinforcement learning (RL) with the actor-critic structure has gained significant interest for continuous control tasks. However, achieving near-optimal control policies with safety and convergence guarantees remains challenging. Moreover, few works have focused on designing RL algorithms that handle time-varying safety constraints. This article proposes a safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. The algorithm's novelty lies in two key aspects. Firstly, the approach introduces a unique barrier force-based control policy structure to ensure control safety during learning. Secondly, a multistep policy evaluation mechanism is employed, enabling the prediction of policy safety risks under time-varying constraints and guiding safe updates. Theoretical results on learning convergence, stability, and robustness are proven. The proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment. It is also applied to the real-world problem of integrated path following and collision avoidance for two intelligent vehicles - a differential-drive vehicle and an Ackermann-drive one. The experimental results demonstrate the impressive sim-to-real transfer capability of our approach, while showcasing satisfactory online control performance.

Files

Model-Based_Safe_Reinforcement... (pdf)

(pdf | 2.02 Mb)