Off-policy experience retention for deep actor-critic learning
Tim de Bruin (TU Delft - OLD Intelligent Control & Robotics)
Jens Kober (TU Delft - OLD Intelligent Control & Robotics)
K.P. Tuyls (TU Delft - OLD Intelligent Control & Robotics, University of Liverpool)
Robert Babuska (TU Delft - OLD Intelligent Control & Robotics)
More Info
expand_more
Abstract
When a limited number of experiences is kept in memory to train a reinforcement learning agent, the criterion that determines which experiences are retained can have a strong impact on the learning performance. In this paper, we argue that for actor critic learning in domains with significant momentum, it is important to retain experiences with off-policy actions when the amount of exploration is reduced over time. This claim is supported by simulation experiments with a pendulum swing-up problem and a magnetic manipulation task. Additionally, we compare our strategy to database overwriting policies based on obtaining experiences spread out over the state-action space, and also to using the temporal difference error as a proxy for the value of experiences.
No files available
Metadata only record. There are no files for this record.