Bayesian RL in factored POMDPs

None, None; None, None; None, None

Bayesian RL in factored POMDPs

Abstract (2019)

Author(s)

Sammie Katt (Northeastern University)

F.A. Oliehoek (TU Delft - Interactive Intelligence)

C Amato (Northeastern University)

Research Group

Interactive Intelligence

Copyright

To reference this document use:

https://resolver.tudelft.nl/uuid:16a7e311-d9c6-4601-8bbe-46bad1c1071c

More Info

expand_more

Publication Year

2019

Language

English

Copyright

Research Group

Interactive Intelligence

Pages (from-to)

1-3

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Robust decision-making agents in any non-trivial system must reason over uncertainty of various types such as action outcomes, the agent's current state and the dynamics of the environment. The outcome and state un- certainty are elegantly captured by the Partially Observable Markov Decision Processes (POMDP) framework [1], which enable reasoning in stochastic, par- tially observable environments. POMDP solution methods, however, typically assume complete access to the system dynamics, which unfortunately are often not available. When such a model is not available, model-based Bayesian Re- inforcement Learning (BRL) methods explicitly maintain a posterior over the possible models of the environment, and use this knowledge to select actions that, theoretically, trade o_ exploration and exploitation optimally. However, few of the BRL methods are applicable to partial observable settings, and those that are, have limited scaling properties. The Bayes-Adaptive POMDP (BA- POMDP) [4], for example, models the environment in a tabular fashion, which poses a bottleneck for scalability. Here, we describe previous work [3] that pro- poses a method to overcome this bottleneck by representing the dynamics with Bayes Network, an approach that exploits structure in the form of independence between state and observation features.

Files

Abstract39.pdf

(pdf | 0.447 Mb)