Bayesian RL in factored POMDPs

Abstract (2019)
Author(s)

Sammie Katt (Northeastern University)

F.A. Oliehoek (TU Delft - Interactive Intelligence)

C Amato (Northeastern University)

Research Group
Interactive Intelligence
Copyright
© 2019 Sammie Katt, F.A. Oliehoek, Chris Amato
More Info
expand_more
Publication Year
2019
Language
English
Copyright
© 2019 Sammie Katt, F.A. Oliehoek, Chris Amato
Research Group
Interactive Intelligence
Pages (from-to)
1-3
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Robust decision-making agents in any non-trivial system must reason over uncertainty of various types such as action outcomes, the agent's current state and the dynamics of the environment. The outcome and state un- certainty are elegantly captured by the Partially Observable Markov Decision Processes (POMDP) framework [1], which enable reasoning in stochastic, par- tially observable environments. POMDP solution methods, however, typically assume complete access to the system dynamics, which unfortunately are often not available. When such a model is not available, model-based Bayesian Re- inforcement Learning (BRL) methods explicitly maintain a posterior over the possible models of the environment, and use this knowledge to select actions that, theoretically, trade o_ exploration and exploitation optimally. However, few of the BRL methods are applicable to partial observable settings, and those that are, have limited scaling properties. The Bayes-Adaptive POMDP (BA- POMDP) [4], for example, models the environment in a tabular fashion, which poses a bottleneck for scalability. Here, we describe previous work [3] that pro- poses a method to overcome this bottleneck by representing the dynamics with Bayes Network, an approach that exploits structure in the form of independence between state and observation features.