Comparing Deep Reinforcement Learning Approaches for Sparse Reward Settings with Discrete State-Action Spaces

Bachelor thesis (2021)

Authors

A.S. Çapanoğlu Electrical Engineering, Mathematics and Computer Science

Contributors

G. Neustroev Algorithmics - (supervisor 1)

M.M. de Weerdt Algorithmics - (supervisor 2)

Marco Zuniga Embedded Systems - (coach)

Faculty

Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:aa87a2ad-ed2b-4e7d-91e8-73fe17879d6d

Published Date

30-06-2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

One of the most challenging types of environments for a Deep Reinforcement Learning agent to learn in are those with sparse reward functions. There exist algorithms that are designed to perform well in settings with sparse rewards, but they are often applied to continuous state-action spaces, since economically relevant problems like robotic control and stock trading fall under this category. This means the continuous version overshadows the discrete state-action version of the sparse reward problem. Furthermore, research that focuses on sparse rewards is lacking in comparisons of algorithms dedicated to performing in this type of setting with other state-of-the-art Deep Reinforcement Learning algorithms. We devise an experimental setup to test a selection of algorithms from three state-of-the-art Deep Reinforcement Learning approaches; Hindsight Experience Replay, Maximum Entropy Reinforcement Learning and Distributional Reinforcement Learning. We show that as the cardinality of the state spaces in sparse reward settings increase, Hindsight Experience Replay approaches are superior in sample efficiency compared to the other two approaches studied.

Files

Final_Report_AlpCapanoglu.pdf

(.pdf | 0.77 Mb)