Uncertainty Based Exploration in Reinforcement Learning

None, None

Uncertainty Based Exploration in Reinforcement Learning

Analyzing the Robustness of Bayesian Deep Q-Networks

Bachelor Thesis (2025)

Author(s)

S. Schwartz (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Neil Yorke-Smith – Mentor (TU Delft - Algorithmics)

P.R. van der Vaart – Mentor (TU Delft - Sequential Decision Making)

Matthijs T. J. Spaan – Graduation committee member (TU Delft - Sequential Decision Making)

Faculty

Electrical Engineering, Mathematics and Computer Science

Reinforcement Learning Uncertainty exploration Bayesian deep learning

To reference this document use:

https://resolver.tudelft.nl/uuid:887275c9-ec2e-41d2-a0f9-aea8062009dd

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

06-07-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Bayesian Deep Q-Networks (BDQN) have demonstrated superior exploration capabilities and performance in complex environments such as Atari games, yet their behavior in other simpler settings and their sensitivity to hyperparameters remain understudied. This work evaluates BDQN in both contextual bandit and reinforcement learning tasks, compares it against the standard ϵ-greedy exploration strategy and analyzes its hyperparameter sensitivity. Our results indicate that BDQN outperforms ϵ-greedy DQN in exploration-heavy environments, particularly Deep Sea with sparse rewards, but performs comparably in simpler tasks where exploration is less critical. Sensitivity analysis reveals that the forgetting factor (α) plays a central role in modulating
exploration, while other hyperparameters such as batch size also impact performance to varying degrees. These findings suggest BDQN is a promising strategy for complex tasks requiring persistent exploration, though it introduces additional tuning complexity.

Files

Bachelors-thesis.pdf

(pdf | 2.76 Mb)

License info not available