Know what it does not know

Improving Offline Deep Reinforcement Learning with Uncertainty Estimation

Master thesis (2021)

Authors

J. Smit Electrical Engineering, Mathematics and Computer Science

Contributors

F.A. Oliehoek Interactive Intelligence - (mentor)

C.T. Ponnambalam Algorithmics - (mentor)

J.C. van Gemert Pattern Recognition and Bioinformatics - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Reinforcement Learning Deep Reinforcement Learning Uncertainty Estimation Uncertainty Ensemble Pessimism Offline Reinforcement Learning Deep Uncertainty Estimation Batch reinforcement learning

To reference this document use:

http://resolver.tudelft.nl/uuid:b1fcafbe-172a-40f7-8c2a-39b75986d992

More Info

expand_more

Published Date

31-08-2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Offline reinforcement learning, or learning from a fixed data set, is an attractive alternative to online reinforcement learning. Offline reinforcement learning promises to address the cost and safety implications of taking numerous random or bad actions online, which is a crucial aspect of traditional reinforcement learning that makes it difficult to apply in real-world problems. However, when offline reinforcement learning is naïvely applied to a fixed data set, the resulting policy may exhibit poor performance in the real environment. This happens due to over-estimations of the expected return for state-action pairs not sufficiently covered in the data set. Therefore, offline reinforcement learning agents must know what they do not know, allowing them to avoid these over-estimated state-action pairs and their potentially erroneous outcomes. A promising way to instill offline reinforcement learning agents with this ability is the pessimism principle, which states that agents should select actions that maximize an uncertainty-based lower bound of the expected return. This pessimism principle has drastically improved the performance of offline reinforcement learning methods in the tabular and linear function approximation domain. However, in deep reinforcement learning, uncertainty estimation is highly non-trivial, and the development of effective uncertainty-based pessimistic algorithms remains an open question. That is why in this thesis, we explore various existing deep learning-based uncertainty estimation techniques with the aim to combine them with existing deep reinforcement learning methods to create an uncertainty-aware offline deep reinforcement learning algorithm. This research has resulted in two novel offline deep reinforcement learning methods built on Double Deep Q-Learning and Soft Actor-Critic. We applied these methods to various benchmarks and experiments to demonstrate their interesting and unique properties. In some situations, they even beat the current state-of-the-art results of these benchmarks.

Files

IN5000_Thesis_jordi_.pdf

(.pdf | 4.71 Mb)