Reasoning about MDPs Abstractly: Bayesian Policy Search with Uncertain Prior Knowledge

Master Thesis (2024)
Author(s)

J. Molhoek (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S. Dumancic – Mentor (TU Delft - Algorithmics)

Frans A Oliehoek – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2024 Jord Molhoek
More Info
expand_more
Publication Year
2024
Language
English
Copyright
© 2024 Jord Molhoek
Graduation Date
01-02-2024
Awarding Institution
Delft University of Technology
Programme
Computer Science
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Many real-world problems fall in the category of sequential decision-making under uncertainty; Markov Decision Processes (MDPs) are a common method for modeling such problems. To solve an MDP, one could start from scratch or one could already have an idea of what good policies look like. Furthermore, there could be uncertainty in this idea. In existing literature, a policy search procedure is accelerated by encoding this prior knowledge in an action distribution which is used for policy sampling. Moreover, this is then extended by inferring these action distributions while inferring the policy through Gibbs sampling. Implicitly, this approach assumes a generalization of good and bad actions over the entire state space.

This thesis extends the existing method by leveraging a division of the state space into regions and inferring action distributions over these regions, rather than over the entire state space. We show that this can accelerate the policy search. We also show that the algorithm manages to recover if the division is unjustified. The division into regions can hence also be considered a form of prior knowledge of the policy with uncertainty. Finally, inference of the regions themselves is also explored and yields promising results.

Files

Thesis_Jord_Molhoek.pdf
(pdf | 14.5 Mb)
License info not available