The effect of sampling methods on Deep Q-Networks in robot navigation tasks

None, None

The effect of sampling methods on Deep Q-Networks in robot navigation tasks

Master Thesis (2019)

Author(s)

T.J.L. de Jong (TU Delft - Mechanical Engineering)

Contributor(s)

D.J. Broekens – Mentor (TU Delft - Interactive Intelligence)

M. Wisse – Graduation committee member (TU Delft - Robot Dynamics)

J. Kober – Graduation committee member (TU Delft - Learning & Autonomous Control)

Faculty

Mechanical Engineering

Copyright

Robot Navigation Deep Learning Sampling Experience Replay

To reference this document use:

https://resolver.tudelft.nl/uuid:8ae08ee1-66e0-4eba-9158-ce7d94bf3a98

More Info

expand_more

Publication Year

2019

Language

English

Copyright

Graduation Date

18-01-2019

Awarding Institution

Delft University of Technology

Programme

Mechanical Engineering | BioMechanical Design

Faculty

Mechanical Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Enabling mobile robots to autonomously navigate complex environments is essential for real-world deployment in commercial, industrial, military, health care, and domestic settings. Prior methods approach this problem by having the robot maintain an internal map of the world and then use a localization and planning method to navigate through the internal map. However, these approaches often include a variety of assumptions, are computationally intensive, and do not learn from failures. Recent work in deep reinforcement learning shows that navigational abilities could emerge as the by-product of an agent learning a policy that maximizes reward. Deep Q-Networks (DQN), a reinforcement learning algorithm, uses experience replay to remember and reuse experiences from the past. A sampling technique determents how to sample the experiences that are to be replayed from the experience replay buffer. Here we studied the effect of different sampling techniques on the learning behavior of an agent using DQN in partially observable navigation tasks. In this work five sampling techniques are proposed and compared to the original random sampling technique. We found that sampling techniques focusing on surprising experiences learn faster than random sampling techniques. Secondly, we found that the final performance of all sampling techniques usually converge to the same policy. Finally, we found the correct use of importance sampling is essential when using prioritized techniques.

Files

Thesis_Tobias_de_Jong.pdf

(pdf | 1.07 Mb)

License info not available