Prioritizing states with action sensitive return in experience replay

Master thesis (2023)

Authors

A. Keijzer Mechanical Engineering

Contributors

J. Kober Learning & Autonomous Control - Mechanical, Maritime and Materials Engineering (mentor)

D.S. van der Heijden Learning & Autonomous Control - Mechanical, Maritime and Materials Engineering (mentor)

R. Babuska Learning & Autonomous Control - Mechanical, Maritime and Materials Engineering (graduation committee member)

J.W. Böhmer Algorithmics - (graduation committee member)

Faculty

Mechanical Engineering, Mechanical Engineering

Reinforcement Learning Prioritization Experience Replay

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:74740e04-7905-4d9b-ae69-bc85f98e4360

Published Date

16-06-2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical Engineering

Abstract

Experience replay for off-policy reinforcement learning has been shown to improve sample efficiency and stabilize training. However, typical uniformly sampled replay includes many irrelevant samples for the agent to reach good performance. We introduce Action Sensitive Experience Replay (ASER), a method to prioritize samples in the replay buffer and selectively model parts of the state-space more accurately where choosing sub-optimal actions has a larger effect on the return. We experimentally show that this can make training more sample efficient and that this allows smaller function approximators -- like neural networks with few neurons -- to achieve good performance in environments where they would otherwise struggle.

Files

Thesis_Prioritizing_States_wit... (.pdf)

(.pdf | 5.28 Mb)