Learning and Optimizing Probabilistic Models for Planning under Uncertainty

Master Thesis (2017)
Author(s)

R. van Bekkum (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M. T.J. Spaan – Mentor

Marco Loog – Graduation committee member

J. Kober – Graduation committee member

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2017 Rob van Bekkum
More Info
expand_more
Publication Year
2017
Language
English
Copyright
© 2017 Rob van Bekkum
Graduation Date
27-09-2017
Awarding Institution
Delft University of Technology
Programme
['Computer Science']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Decision-theoretic planning techniques are increasingly being used to obtain (optimal) plans for domains involving uncertainty, which may be present in the form of the controlling agent's actions, its percepts, or exogenous factors in the domain. These techniques build on detailed probabilistic models of the underlying system, for which Markov Decision Processes (MDPs) have become the de facto standard formalism. However, handcrafting these probabilistic models is usually a daunting and error-prone task, requiring expert knowledge on the domain under consideration. Therefore, it is desirable to automate the process of obtaining these models by means of learning algorithms presented with a set of execution traces from the system. Although some work has already been done on crafting such learning algorithms, the state of the art lacks an automated method of configuring their hyperparameters, so to maximize the performance yielded from executing the derived plans.
In this work we present a method that employs the Bayesian Optimization (BO) framework to learn MDPs autonomously from a set of execution traces, optimizing the expected value and performance in simulations over a set of tasks the underlying system is expected to perform. The approach has been tested on learning MDPs for mobile robot navigation, motivated by the significant uncertainty accompanying the robots' actions in this domain.

Files

License info not available