Prioritizing states with action sensitive return in experience replay

Master Thesis (2023)
Author(s)

A. Keijzer (TU Delft - Mechanical Engineering)

Contributor(s)

J. Kober – Mentor (TU Delft - Learning & Autonomous Control)

Bas van der Heijden – Mentor (TU Delft - Learning & Autonomous Control)

R. Babuska – Graduation committee member (TU Delft - Learning & Autonomous Control)

Wendelin Böhmer – Graduation committee member (TU Delft - Algorithmics)

Faculty
Mechanical Engineering
Copyright
© 2023 Alexander Keijzer
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Alexander Keijzer
Graduation Date
16-06-2023
Awarding Institution
Delft University of Technology
Programme
['Mechanical Engineering | Vehicle Engineering | Cognitive Robotics']
Faculty
Mechanical Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Experience replay for off-policy reinforcement learning has been shown to improve sample efficiency and stabilize training. However, typical uniformly sampled replay includes many irrelevant samples for the agent to reach good performance. We introduce Action Sensitive Experience Replay (ASER), a method to prioritize samples in the replay buffer and selectively model parts of the state-space more accurately where choosing sub-optimal actions has a larger effect on the return. We experimentally show that this can make training more sample efficient and that this allows smaller function approximators -- like neural networks with few neurons -- to achieve good performance in environments where they would otherwise struggle.

Files

License info not available