Print Email Facebook Twitter PEBL: Pessimistic Ensembles for Offline Deep Reinforcement Learning Title PEBL: Pessimistic Ensembles for Offline Deep Reinforcement Learning Author Smit, Jordi (Student TU Delft) Ponnambalam, C.T. (TU Delft Algorithmics) Spaan, M.T.J. (TU Delft Algorithmics) Oliehoek, F.A. (TU Delft Interactive Intelligence) Date 2021 Abstract Offline reinforcement learning (RL), or learning from a fixed data set, is an attractive alternative to online RL. Offline RL promises to address the cost and safety implications of tak- ing numerous random or bad actions online, a crucial aspect of traditional RL that makes it difficult to apply in real-world problems. However, when RL is na ̈ıvely applied to a fixed data set, the resulting policy may exhibit poor performance in the real environment. This happens due to over-estimation of the value of state-action pairs not sufficiently covered by the data set. A promising way to avoid this is by applying pessimism and acting according to a lower bound estimate on the value. It has been shown that penalizing the learned value according to a pessimistic bound on the uncertainty can drastically improve offline RL. In deep reinforcement learn- ing, however, uncertainty estimation is highly non-trivial and development of effective uncertainty-based pessimistic algo- rithms remains an open question. This paper introduces two novel offline deep RL methods built on Double Deep Q- Learning and Soft Actor-Critic. We show how a multi-headed bootstrap approach to uncertainty estimation is used to cal- culate an effective pessimistic value penalty. Our approach is applied to benchmark offline deep RL domains, where we demonstrate that our methods can often beat the current state- of-the-art. To reference this document use: http://resolver.tudelft.nl/uuid:2053b579-a663-4def-ad25-4bedad0169be Source Robust and Reliable Autonomy in the Wild Workshop at the 30th International Joint Conference of Artificial Intelligence Event Robust and Reliable Autonomy in the Wild Workshop at the 30th International Joint Conference of Artificial Intelligence, 2021-08-19 Part of collection Institutional Repository Document type conference paper Rights © 2021 Jordi Smit, C.T. Ponnambalam, M.T.J. Spaan, F.A. Oliehoek Files PDF R2AW_paper_6_1.pdf 434.75 KB Close viewer /islandora/object/uuid:2053b579-a663-4def-ad25-4bedad0169be/datastream/OBJ/view