Bayesian Ensembles for Exploration in Deep Q-Learning

Conference Paper (2024)
Authors

Pascal R. van der Vaart (TU Delft - Sequential Decision Making)

N. Yorke-Smith (TU Delft - Algorithmics)

MTJ Spaan (TU Delft - Sequential Decision Making)

Research Group
Sequential Decision Making
More Info
expand_more
Publication Year
2024
Language
English
Research Group
Sequential Decision Making
Pages (from-to)
2528-2530
ISBN (electronic)
9798400704864
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Exploration in reinforcement learning remains a difficult challenge. In order to drive exploration, ensembles with randomized prior functions have recently been popularized to quantify uncertainty in the value model. There is no theoretical reason for these ensembles to resemble the actual posterior, however. In this work, we view training ensembles from the perspective of Sequential Monte Carlo, a Monte Carlo method that approximates a sequence of distributions with a set of particles. In particular, we propose an algorithm that exploits both the practical flexibility of ensembles and theory of the Bayesian paradigm. We incorporate this method into a standard Deep Q-learning agent (DQN) and experimentally show qualitatively good uncertainty quantification and improved exploration capabilities over a regular ensemble.