Safe Reinforcement Learning for Automated Vehicles

Master Thesis (2020)
Author(s)

R. Cornet (TU Delft - Mechanical Engineering)

Contributor(s)

W. Pan – Mentor (TU Delft - Robust Robot Systems)

M. Wisse – Graduation committee member (TU Delft - Robust Robot Systems)

B. Shyrokau – Graduation committee member (TU Delft - Intelligent Vehicles)

Y. Zheng – Graduation committee member (TU Delft - Intelligent Vehicles)

More Info
expand_more
Publication Year
2020
Language
English
Graduation Date
27-08-2020
Awarding Institution
Programme
Mechanical Engineering, Vehicle Engineering
Downloads counter
361
Collections
thesis
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Fully automated vehicles have the potential to increase road safety and improve traffic flow by taking the human element out of the driving loop. They can also provide mobility to people who are unable to operate a conventional vehicle. Safe automated vehicles must be able to respond in emergency situations or drive on slippery roads in bad weather conditions. Therefore it is crucial to have a safe and robust control strategy that can use the full handling capabilities of the vehicle.

This thesis presents how safe reinforcement learning can be used to design a steering policy that can drive an automated vehicle at the limit of friction.

The steering policies are trained using the Lyapunov Safe Actor-Critic (LSAC) algorithm. LSAC is a combination of the Soft Actor-Critic (SAC) algorithm and a Lyapunov stability analysis to solve constrained control problems.

The performance of LSAC is tested in a vehicle simulator against SAC and Model Predictive Control (MPC) in a series of tests that include changing lanes at different speeds, recovering from a destabilizing collision, and driving on a race track at the limit of friction.

The experiments show that LSAC outperforms MPC and SAC control strategies in terms of safety and vehicle stability. LSAC can recover from larger disturbances than MPC and SAC. A control strategy is presented that will keep the vehicle stable when driving at the limit of friction but can use the maneuverability of an unstable vehicle when is it necessary to avoid dangerous situations. Additionally, a policy is presented that can find the fastest way around a race track while staying within the track limits.

Files

License info not available