Prioritizing states with action sensitive return in experience replay
A. Keijzer (TU Delft - Mechanical Engineering)
J. Kober – Mentor (TU Delft - Mechanical Engineering)
D.S. van der Heijden – Mentor (TU Delft - Mechanical Engineering)
R. Babuska – Graduation committee member (TU Delft - Mechanical Engineering)
J.W. Böhmer – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Experience replay for off-policy reinforcement learning has been shown to improve sample efficiency and stabilize training. However, typical uniformly sampled replay includes many irrelevant samples for the agent to reach good performance. We introduce Action Sensitive Experience Replay (ASER), a method to prioritize samples in the replay buffer and selectively model parts of the state-space more accurately where choosing sub-optimal actions has a larger effect on the return. We experimentally show that this can make training more sample efficient and that this allows smaller function approximators -- like neural networks with few neurons -- to achieve good performance in environments where they would otherwise struggle.