A Unified Scaling Law for Bootstrapped DQNs
R. Knyazhitskiy (TU Delft - Electrical Engineering, Mathematics and Computer Science)
P.R. van der Vaart – Mentor (TU Delft - Sequential Decision Making)
N. Yorke-Smith – Mentor (TU Delft - Algorithmics)
MTJ Spaan – Graduation committee member (TU Delft - Sequential Decision Making)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
We present a large-scale empirical study of Bootstrapped DQN (BDQN) and Randomized-Prior BDQN (RP-BDQN) in the DeepSea environment, aimed at characterizing their scaling properties. Our primary contribution is a unified scaling law that accurately models the probability of reward discovery as a function of task hardness and ensemble size. This law is parameterized by a method-dependent effectiveness factor, $\psi$.Under this framework, RP-BDQN demonstrates substantially higher effectiveness ($\psi \approx 0.87$) compared to BDQN ($\psi \approx 0.80$), enabling it to solve more challenging tasks.Our analysis reveals that this advantage stems from RP-BDQN's sustained ensemble diversity, which mitigates the posterior collapse observed in BDQN.Furthermore, we demonstrate diminishing returns in performance for ensemble sizes $K>10$. These results offer practical guidance for ensemble configuration and raise new theoretical questions surrounding the effectiveness parameter $\psi$.