BADDr

Bayes-Adaptive Deep Dropout RL for POMDPs

Conference Paper (2022)
Author(s)

Sammie Katt (Northeastern University)

Hai Nguyen (Northeastern University)

FA Oliehoek (TU Delft - Interactive Intelligence)

Christopher Amato (Northeastern University)

Research Group
Interactive Intelligence
Copyright
© 2022 Sammie Katt, Hai Nguyen, F.A. Oliehoek, Christopher Amato
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Sammie Katt, Hai Nguyen, F.A. Oliehoek, Christopher Amato
Research Group
Interactive Intelligence
Pages (from-to)
723-731
ISBN (electronic)
978-171385433-3
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with varied success. This work presents a representation-agnostic formulation of BRL under partially observability, unifying the previous models under one theoretical umbrella. To demonstrate its practical significance we also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks. Under this parameterization, in contrast to previous work, the belief over the state and dynamics is a more scalable inference problem. We choose actions through Monte-Carlo tree search and empirically show that our method is competitive with state-of-the-art BRL methods on small domains while being able to solve much larger ones.

Files

3535850.3535932.pdf
(pdf | 2.09 Mb)
- Embargo expired in 05-12-2022
License info not available