Reasoning about MDPs Abstractly: Bayesian Policy Search with Uncertain Prior Knowledge

None, None

Reasoning about MDPs Abstractly: Bayesian Policy Search with Uncertain Prior Knowledge

Master Thesis (2024)

Author(s)

J. Molhoek (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S. Dumancic – Mentor (TU Delft - Algorithmics)

Frans A Oliehoek – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Artifical Intelligence Reinforcement Learning (RL) Machine Learning (ML) Bayesian Inference Markov Decision Processes Planning under Uncertainty Sequential Decison Making Probabilistic Programming MDP

To reference this document use:

https://resolver.tudelft.nl/uuid:7c745398-8fc6-487a-81b2-85a7d9789fbb

More Info

expand_more

Publication Year

2024

Language

English

Copyright

Graduation Date

01-02-2024

Awarding Institution

Delft University of Technology

Programme

Computer Science

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Many real-world problems fall in the category of sequential decision-making under uncertainty; Markov Decision Processes (MDPs) are a common method for modeling such problems. To solve an MDP, one could start from scratch or one could already have an idea of what good policies look like. Furthermore, there could be uncertainty in this idea. In existing literature, a policy search procedure is accelerated by encoding this prior knowledge in an action distribution which is used for policy sampling. Moreover, this is then extended by inferring these action distributions while inferring the policy through Gibbs sampling. Implicitly, this approach assumes a generalization of good and bad actions over the entire state space.

This thesis extends the existing method by leveraging a division of the state space into regions and inferring action distributions over these regions, rather than over the entire state space. We show that this can accelerate the policy search. We also show that the algorithm manages to recover if the division is unjustified. The division into regions can hence also be considered a form of prior knowledge of the policy with uncertainty. Finally, inference of the regions themselves is also explored and yields promising results.

Files

Thesis_Jord_Molhoek.pdf

(pdf | 14.5 Mb)

License info not available