Reasoning about MDPs Abstractly: Bayesian Policy Search with Uncertain Prior Knowledge

More Info
expand_more

Abstract

Many real-world problems fall in the category of sequential decision-making under uncertainty; Markov Decision Processes (MDPs) are a common method for modeling such problems. To solve an MDP, one could start from scratch or one could already have an idea of what good policies look like. Furthermore, there could be uncertainty in this idea. In existing literature, a policy search procedure is accelerated by encoding this prior knowledge in an action distribution which is used for policy sampling. Moreover, this is then extended by inferring these action distributions while inferring the policy through Gibbs sampling. Implicitly, this approach assumes a generalization of good and bad actions over the entire state space. This thesis extends the existing method by leveraging a division of the state space into regions and inferring action distributions over these regions, rather than over the entire state space. We show that this can accelerate the policy search. We also show that the algorithm manages to recover if the division is unjustified. The division into regions can hence also be considered a form of prior knowledge of the policy with uncertainty. Finally, inference of the regions themselves is also explored and yields promising results.