PEBL: Pessimistic Ensembles for Offline Deep Reinforcement Learning

Smit, Jordi; Ponnambalam, C.T.; Spaan, M.T.J.; Oliehoek, F.A.

PEBL: Pessimistic Ensembles for Offline Deep Reinforcement Learning

Title

PEBL: Pessimistic Ensembles for Offline Deep Reinforcement Learning

Author

Smit, Jordi (Student TU Delft)
Ponnambalam, C.T. (TU Delft Algorithmics)
Spaan, M.T.J. (TU Delft Algorithmics)
Oliehoek, F.A. (TU Delft Interactive Intelligence)

Date

2021

Abstract

Offline reinforcement learning (RL), or learning from a fixed data set, is an attractive alternative to online RL. Offline RL promises to address the cost and safety implications of tak- ing numerous random or bad actions online, a crucial aspect of traditional RL that makes it difficult to apply in real-world problems. However, when RL is na ̈ıvely applied to a fixed data set, the resulting policy may exhibit poor performance in the real environment. This happens due to over-estimation of the value of state-action pairs not sufficiently covered by the data set. A promising way to avoid this is by applying pessimism and acting according to a lower bound estimate on the value. It has been shown that penalizing the learned value according to a pessimistic bound on the uncertainty can drastically improve offline RL. In deep reinforcement learn- ing, however, uncertainty estimation is highly non-trivial and development of effective uncertainty-based pessimistic algo- rithms remains an open question. This paper introduces two novel offline deep RL methods built on Double Deep Q- Learning and Soft Actor-Critic. We show how a multi-headed bootstrap approach to uncertainty estimation is used to cal- culate an effective pessimistic value penalty. Our approach is applied to benchmark offline deep RL domains, where we demonstrate that our methods can often beat the current state- of-the-art.

To reference this document use:

http://resolver.tudelft.nl/uuid:2053b579-a663-4def-ad25-4bedad0169be

Source

Robust and Reliable Autonomy in the Wild Workshop at the 30th International Joint Conference of Artificial Intelligence

Event

Robust and Reliable Autonomy in the Wild Workshop at the 30th International Joint Conference of Artificial Intelligence, 2021-08-19

Part of collection

Institutional Repository

Document type

conference paper

Rights

Files

PDF

R2AW_paper_6_1.pdf

434.75 KB

Close viewer