Safe Reinforcement Learning for Automated Vehicles

Cornet, R.

Safe Reinforcement Learning for Automated Vehicles

Master thesis (2020)

Authors

R. Cornet Mechanical Engineering

Contributors

Wei Pan Robot Dynamics (mentor)

M. Wisse Robot Dynamics (graduation committee member)

Barys Shyrokau Intelligent Vehicles (graduation committee member)

Yanggu Zheng Intelligent Vehicles (graduation committee member)

Faculty

Mechanical Engineering, Mechanical Engineering

Reinforcement Learning Automated driving Automated Vehicles Safe reinforcement learning

To reference this document use:

http://resolver.tudelft.nl/uuid:7bedb60a-ced8-4fcf-97ca-80208861a413

More Info

expand_more

Published Date

27-08-2020

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical Engineering

Abstract

Fully automated vehicles have the potential to increase road safety and improve traffic flow by taking the human element out of the driving loop. They can also provide mobility to people who are unable to operate a conventional vehicle. Safe automated vehicles must be able to respond in emergency situations or drive on slippery roads in bad weather conditions. Therefore it is crucial to have a safe and robust control strategy that can use the full handling capabilities of the vehicle.

This thesis presents how safe reinforcement learning can be used to design a steering policy that can drive an automated vehicle at the limit of friction.

The steering policies are trained using the Lyapunov Safe Actor-Critic (LSAC) algorithm. LSAC is a combination of the Soft Actor-Critic (SAC) algorithm and a Lyapunov stability analysis to solve constrained control problems.

The performance of LSAC is tested in a vehicle simulator against SAC and Model Predictive Control (MPC) in a series of tests that include changing lanes at different speeds, recovering from a destabilizing collision, and driving on a race track at the limit of friction.

The experiments show that LSAC outperforms MPC and SAC control strategies in terms of safety and vehicle stability. LSAC can recover from larger disturbances than MPC and SAC. A control strategy is presented that will keep the vehicle stable when driving at the limit of friction but can use the maneuverability of an unstable vehicle when is it necessary to avoid dangerous situations. Additionally, a policy is presented that can find the fastest way around a race track while staying within the track limits.

Files

Thesis_Robert_Cornet_4302087.p... (pdf)

(pdf | 6.07 Mb)

License info not available