Revisiting Langevin Monte Carlo Applied to Deep Q-Learning: An Empirical Study of Robustness and Sensitivity

None, None

Revisiting Langevin Monte Carlo Applied to Deep Q-Learning: An Empirical Study of Robustness and Sensitivity

Bachelor Thesis (2025)

Author(s)

P. Hendriks (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

N. Yorke-Smith – Mentor (TU Delft - Algorithmics)

P.R. van der Vaart – Mentor (TU Delft - Sequential Decision Making)

MTJ Spaan – Graduation committee member (TU Delft - Sequential Decision Making)

Faculty

Electrical Engineering, Mathematics and Computer Science

To reference this document use:

https://resolver.tudelft.nl/uuid:43888331-6504-4ddd-a42b-bc2364fba09f

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

27-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Deep Reinforcement Learning has achieved superhuman performance in many tasks, such as robotic control or autonomous driving. Algorithms in Deep Reinforcement Learning still suffer from a sample efficiency problem, where, in many cases, millions of samples are needed to achieve good performance. Recently, Bayesian uncertainty-based algorithms have gained traction. This work focuses on providing a better understanding of the behaviour of Langevin Monte Carlo algorithms for Bayesian posterior approximation applied on top of Q-learning. This research builds on top of already existing algorithms, aiming to provide a better understanding of the underlying mechanics that drive them. We provide empirical experimentation with different hyperparameters in three different environments. Our results suggest that hyperparameters that were previously thought not to have a big impact on the algorithms are crucial for deep exploration.

Files

ResearchPaper-PabloHendriks.pd... (pdf)

(pdf | 31 Mb)

License info not available