The effect of sampling methods on Deep Q-Networks in robot navigation tasks
More Info
expand_more
Abstract
Enabling mobile robots to autonomously navigate complex environments is essential for real-world deployment in commercial, industrial, military, health care, and domestic settings. Prior methods approach this problem by having the robot maintain an internal map of the world and then use a localization and planning method to navigate through the internal map. However, these approaches often include a variety of assumptions, are computationally intensive, and do not learn from failures. Recent work in deep reinforcement learning shows that navigational abilities could emerge as the by-product of an agent learning a policy that maximizes reward. Deep Q-Networks (DQN), a reinforcement learning algorithm, uses experience replay to remember and reuse experiences from the past. A sampling technique determents how to sample the experiences that are to be replayed from the experience replay buffer. Here we studied the effect of different sampling techniques on the learning behavior of an agent using DQN in partially observable navigation tasks. In this work five sampling techniques are proposed and compared to the original random sampling technique. We found that sampling techniques focusing on surprising experiences learn faster than random sampling techniques. Secondly, we found that the final performance of all sampling techniques usually converge to the same policy. Finally, we found the correct use of importance sampling is essential when using prioritized techniques.