Pessimistic Iterative Planning with RNNs for Robust POMDPs
Maris F.L. Galesloot (Radboud Universiteit Nijmegen)
Marnix Suilen (Flanders Make)
Thiago D. Simão (Eindhoven University of Technology, TU Delft - Sequential Decision Making)
Steven Carr (The University of Texas at Austin)
Matthijs T.J. Spaan (TU Delft - Sequential Decision Making)
Ufuk Topcu (The University of Texas at Austin)
Nils Jansen (Radboud Universiteit Nijmegen, Center for Interface-Dominated High Performance Materials)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Robust POMDPs extend classical POMDPs to incorporate model uncertainty using so-called uncertainty sets on the transition and observation functions, effectively defining ranges of probabilities. Policies for robust POMDPs must be (1) memory-based to account for partial observability and (2) robust against model uncertainty to account for the worst-case probability instances from the uncertainty sets. To compute such robust memory-based policies, we propose the pessimistic iterative planning (PIP) framework, which alternates between (1) selecting pessimistic POMDPs via worst-case probability instances from the uncertainty sets, and (2) computing finite-state controllers (FSCs) for these pessimistic POMDPs. Within PIP, we propose the RFSCNET algorithm, which optimizes a recurrent neural network to compute the FSCs. The empirical evaluation shows that RFSCNET can compute better-performing robust policies than several baselines and a state-of-the-art robust POMDP solver.