Improved deep reinforcement learning for robotics through distribution-based experience retention

Conference Paper (2016)
Author(s)

Tim de Bruin (TU Delft - OLD Intelligent Control & Robotics)

J. Kober (TU Delft - OLD Intelligent Control & Robotics)

K.P. Tuyls (University of Liverpool, TU Delft - Delft Center for Systems and Control)

R. Babuska (TU Delft - OLD Intelligent Control & Robotics)

Research Group
OLD Intelligent Control & Robotics
Copyright
© 2016 T.D. de Bruin, J. Kober, K.P. Tuyls, R. Babuska
DOI related publication
https://doi.org/10.1109/IROS.2016.7759581
More Info
expand_more
Publication Year
2016
Language
English
Copyright
© 2016 T.D. de Bruin, J. Kober, K.P. Tuyls, R. Babuska
Related content
Research Group
OLD Intelligent Control & Robotics
Pages (from-to)
3947-3952
ISBN (print)
978-1-5090-3762-9
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Recent years have seen a growing interest in the use of deep neural networks as function approximators in reinforcement learning. In this paper, an experience replay method is proposed that ensures that the distribution of the experiences used for training is between that of the policy and a uniform distribution. Through experiments on a magnetic manipulation task it is shown that the method reduces the need for sustained exhaustive exploration during learning. This makes it attractive in scenarios where sustained exploration is in-feasible or undesirable, such as for physical systems like robots and for life long learning. The method is also shown to improve the generalization performance of the trained policy, which can make it attractive for transfer learning. Finally, for small experience databases the method performs favorably when compared to the recently proposed alternative of using the temporal difference error to determine the experience sample distribution, which makes it an attractive option for robots with limited memory capacity.

Files

DeBruinIROS2016.pdf
(pdf | 3.23 Mb)
License info not available