A Unified Scaling Law for Bootstrapped DQNs

None, None

A Unified Scaling Law for Bootstrapped DQNs

Bachelor Thesis (2025)

Author(s)

R. Knyazhitskiy (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

P.R. van der Vaart – Mentor (TU Delft - Sequential Decision Making)

N. Yorke-Smith – Mentor (TU Delft - Algorithmics)

MTJ Spaan – Graduation committee member (TU Delft - Sequential Decision Making)

Faculty

Electrical Engineering, Mathematics and Computer Science

Machine learning Reinforcement learning Scaling study

To reference this document use:

https://resolver.tudelft.nl/uuid:6fb011b5-c2dc-429f-8ae8-e68ecc7d1680

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

27-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

We present a large-scale empirical study of Bootstrapped DQN (BDQN) and Randomized-Prior BDQN (RP-BDQN) in the DeepSea environment, aimed at characterizing their scaling properties. Our primary contribution is a unified scaling law that accurately models the probability of reward discovery as a function of task hardness and ensemble size. This law is parameterized by a method-dependent effectiveness factor, $\psi$.Under this framework, RP-BDQN demonstrates substantially higher effectiveness ($\psi \approx 0.87$) compared to BDQN ($\psi \approx 0.80$), enabling it to solve more challenging tasks.Our analysis reveals that this advantage stems from RP-BDQN's sustained ensemble diversity, which mitigates the posterior collapse observed in BDQN.Furthermore, we demonstrate diminishing returns in performance for ensemble sizes $K>10$. These results offer practical guidance for ensemble configuration and raise new theoretical questions surrounding the effectiveness parameter $\psi$.

Files

Preprint.pdf

(pdf | 2.91 Mb)

License info not available