Safe Navigation in Dense Traffic Scenarios using Reinforcement Learning as Global Guidance for a Model Predictive Controller

Master Thesis (2020)
Author(s)

A. Agarwal (TU Delft - Mechanical Engineering)

Contributor(s)

J. Alonso-Mora – Mentor (TU Delft - Learning & Autonomous Control)

Bruno Brito – Mentor (TU Delft - Learning & Autonomous Control)

Faculty
Mechanical Engineering
Copyright
© 2020 Achin Agarwal
More Info
expand_more
Publication Year
2020
Language
English
Copyright
© 2020 Achin Agarwal
Graduation Date
14-12-2020
Awarding Institution
Delft University of Technology
Programme
Mechanical Engineering | Vehicle Engineering
Faculty
Mechanical Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The successful integration of autonomous vehicles (AVs) in human environments is highly dependent on their ability to navigate safely and timely through dense traffic conditions. Such conditions involve a diverse range of human behaviors, ranging from cooperative (willing to yield) to non-cooperative human drivers (unwilling to yield) that need to be identified without any explicit inter-vehicle communication. In order to maneuver through such conditions, AVs must not only compute a collision-free trajectory but also account for the effects of its actions on the surrounding agents to negotiate the navigation maneuver safely. Existing motion planning techniques fail in these environments because they suffer from one or more of the following drawbacks: suffer from ”the curse of dimensionality” due to the high number of agents (e.g., optimization-based methods); do not account for the interaction effects among the agents; do not provide any collision avoidance or trajectory feasibility guarantees (e.g., learning-based methods). In this paper, we propose a novel navigation framework combining the strengths of learning-based with optimization-based algorithms. More specifically, we employ a Soft Actor-Critic agent to learn a continuous guidance policy that provides global guidance to an optimization-based planner generating feasible and collision- free trajectories. We evaluate our method in a highly inter- active simulation environment where we compare our method with two baseline approaches, a learning-based method and an optimization-based method, and present performance results demonstrating our method significantly reduces the number of collisions and increase the success rate with fewer number of deadlocks. We also show that that our method is able to generalise and applicable to other traffic scenarios (e.g., an unprotected left turn).

Files

Thesis_Achin_final.pdf
(pdf | 2.77 Mb)
License info not available