A Unified Scaling Law for Bootstrapped DQNs

Bachelor Thesis (2025)
Author(s)

R. Knyazhitskiy (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

P.R. van der Vaart – Mentor (TU Delft - Sequential Decision Making)

N. Yorke-Smith – Mentor (TU Delft - Algorithmics)

MTJ Spaan – Graduation committee member (TU Delft - Sequential Decision Making)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
27-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

We present a large-scale empirical study of Bootstrapped DQN (BDQN) and Randomized-Prior BDQN (RP-BDQN) in the DeepSea environment, aimed at characterizing their scaling properties. Our primary contribution is a unified scaling law that accurately models the probability of reward discovery as a function of task hardness and ensemble size. This law is parameterized by a method-dependent effectiveness factor, $\psi$.Under this framework, RP-BDQN demonstrates substantially higher effectiveness ($\psi \approx 0.87$) compared to BDQN ($\psi \approx 0.80$), enabling it to solve more challenging tasks.Our analysis reveals that this advantage stems from RP-BDQN's sustained ensemble diversity, which mitigates the posterior collapse observed in BDQN.Furthermore, we demonstrate diminishing returns in performance for ensemble sizes $K>10$. These results offer practical guidance for ensemble configuration and raise new theoretical questions surrounding the effectiveness parameter $\psi$.

Files

Preprint.pdf
(pdf | 2.91 Mb)
License info not available