P.R. van der Vaart | TU Delft Repository

Uncertainty Based Exploration in Reinforcement Learning

Analyzing the Robustness of Bayesian Deep Q-Networks

Bachelor thesis (2025) - S. Schwartz (author) , N. Yorke-Smith (mentor) , P.R. van der Vaart (mentor) , Matthijs TJ Spaan (graduation committee member)

Bayesian Deep Q-Networks (BDQN) have demonstrated superior exploration capabilities and performance in complex environments such as Atari games, yet their behavior in other simpler settings and their sensitivity to hyperparameters remain understudied. This work evaluates BDQN in ...

A Unified Scaling Law for Bootstrapped DQNs

Bachelor thesis (2025) - R. Knyazhitskiy (author) , P.R. van der Vaart (mentor) , N. Yorke-Smith (mentor) , Matthijs TJ Spaan (graduation committee member)

We present a large-scale empirical study of Bootstrapped DQN (BDQN) and Randomized-Prior BDQN (RP-BDQN) in the DeepSea environment, aimed at characterizing their scaling properties. Our primary contribution is a unified scaling law that accurately models the probability of reward ...

Using NoisyNet to Improve Exploration in Contextual Bandit Settings

Bachelor thesis (2025) - S. Ruff (author) , P.R. van der Vaart (mentor) , N. Yorke-Smith (mentor) , Matthijs TJ Spaan (graduation committee member)

Efficient exploration is a major issue in reinforcement learning, particularly in environments with sparse rewards. In these environments, traditional methods like e-greedy fail to efficiently reach an optimal policy. A new method proposed by Fortunato, et al. Fortunato, et al. s ...

Revisiting Langevin Monte Carlo Applied to Deep Q-Learning: An Empirical Study of Robustness and Sensitivity

Bachelor thesis (2025) - P. Hendriks (author) , N. Yorke-Smith (mentor) , P.R. van der Vaart (mentor) , Matthijs TJ Spaan (graduation committee member)

Deep Reinforcement Learning has achieved superhuman performance in many tasks, such as robotic control or autonomous driving. Algorithms in Deep Reinforcement Learning still suffer from a sample efficiency problem, where, in many cases, millions of samples are needed to achieve g ...

Empirical Evaluation of Random Network Distillation for DQN Agents

Bachelor thesis (2025) - A. Moreno (author) , N. Yorke-Smith (graduation committee member) , P.R. van der Vaart (mentor)

This paper investigates how Random Network Distillation (RND), coupled with Boltzmann exploration, influences exploration behaviour and learning dynamics in value-based agents such as Deep Q-Learning (DQN) across a range of environments, from classic control tasks to behaviour su ...