Bayesian Model-Free Deep Reinforcement Learning
P.R. van der Vaart (TU Delft - Sequential Decision Making)
M.T.J. Spaan – Promotor (TU Delft - Sequential Decision Making)
N. Yorke-Smith – Promotor (TU Delft - Algorithmics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The goal of reinforcement learning is to train agents to perform tasks under little supervision. Tasks are specified by a reward function and transition function, which state how much reward the agent gets for its action in a state, and how the environment state changes based on the action the agent took. Typically reinforcement learning assumes no prior knowledge over the reward and transition function,meaning that agents need to explore the environment and learn essentially through trial and error. Model-free methods attempt to learn which actions lead to good outcomes without modeling the reward or environments itself. Efficiently selecting that actions are promising is an active research direction which can greatly reduce the number of total interactions needed for an agent to learn the task, potentially opening the door to new applications where trials or simulations are expensive or compute is limited.
Uncertainty quantification is a central mechanism in such efficient exploration methods. Provided with an estimate of how certain the agent is about the outcome of an action, it can intelligently weigh whether it is worth exploring. The Bayesian paradigm is one method to quantify uncertainty in machine learning. It models the uncertainty with a probability distributions over models, specifying how likely a model is based on the data the agent has collected.
We adopt a Bayesian point of view in model-free reinforcement learning, and develop a deeper understanding on when Bayesian reinforcement learning methods can be expected to work well and challenges that remain. To this end, in Chapter 2 we propose training ensembles through Sequential Monte Carlo, obtaining a sample from the posterior distribution of a deep Q-learning agent. We observe that agents are able to perform directed exploration, although not necessarily more efficiently than standard ensembles in every environment. Furthermore, in Chapter 3 we theoretically analyze existing Bayesian Deep model-Free Reinforcement Learning methods, and unify them into a single theoretical framework we call Epistemic Bellman Operators. We prove that these operators are contractions, establishing convergence of derived algorithms in a simplified setting. Finally, in Chapter 4 we analyze the likelihood and prior assumptions
in existing Bayesian deep model-free reinforcement learning methods, and find through statistical tests that the standard likelihood assumptions are violated on every benchmark we tested. We also find that we can improve performance of Bayesian model free reinforcement learning methods by picking different priors based on empirical data from unrelated tasks, which transfer to new environments.
This dissertation establishes several desirable properties of Bayesian Deep model free reinforcement learning, but also raises some key issues, most notably misspecification in Chapter 4. We hope our findings convince other Bayesian reinforcement learning researchers to give more attention to assumptions about priors and likelihoods.