Learning a guidance policy to navigate among dynamic agents in constrained environments with continual reinforcement learning

Master thesis (2021)

Authors

J.S. Brinkman Mechanical Engineering

Contributors

J. Alonso-Mora Learning & Autonomous Control - Mechanical, Maritime and Materials Engineering (mentor)

B.F. Ferreira de Brito Learning & Autonomous Control - Mechanical, Maritime and Materials Engineering (graduation committee member)

Faculty

Mechanical Engineering, Mechanical Engineering

Deep Reinforcement Learning Continual Learning Motion Planning

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:31ec8d39-5b00-4d6a-a512-6dbe5f24739d

Published Date

22-07-2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical Engineering

Abstract

Mobile robots that operate in human environments require the ability to safely navigate among humans and other obstacles. Existing approaches use Deep Reinforcement Learning (DRL) to obtain safe robot behavior in such environments, but do not ensure collision avoidance or trajectory feasibility. This issue is solved by methods combining DRL with model predictive control (MPC). However, they do not account for static obstacle avoidance. Moreover, DRL-based approaches train their network in multiple environments with increasing difficulty to speed up training and improve generalization ability. By sequentially training the model on new (more complex) environments, new knowledge could interfere with old knowledge, a problem known as catastrophic forgetting.
Despite performing well on the most challenging scenarios, performance on more simple scenarios is diminished. This paper introduces a continual reinforcement learning (CRL) strategy that can sequentially learn multiple navigation tasks while retaining performance on all previously learned tasks. For robot navigation, we utilize a combination of MPC with DRL, to develop an algorithm that can avoid not only dynamic interacting agents but also static obstacles. Our approach is shown to be able to safely navigate to a goal position in multiple environments. In addition, by sequentially learning multiple tasks, we improve navigation performance in terms of successful events and travel times and distances compared to other DRL approaches.

Files

Master_thesis_Report_J_S_Brink... (.pdf)

(.pdf | 5.61 Mb)