Empirical Evaluation of Random Network Distillation for DQN Agents
A. Moreno (TU Delft - Electrical Engineering, Mathematics and Computer Science)
N. Yorke-Smith – Graduation committee member (TU Delft - Algorithmics)
P.R. van der Vaart – Mentor (TU Delft - Sequential Decision Making)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
This paper investigates how Random Network Distillation (RND), coupled with Boltzmann exploration, influences exploration behaviour and learning dynamics in value-based agents such as Deep Q-Learning (DQN) across a range of environments, from classic control tasks to behaviour suite benchmarks and contextual bandits. The study addresses the sensitivity of RND to key hyperparameters, the impact of exploration strategy design, and the transferability of settings across tasks. The results reveal that RND remains benefitial within DQN in both sequential and non-sequential tasks, but requires careful tuning of reward scaling, temperature, and network capacity to be effective. No universal hyperparameter configuration generalizes across environments, and inappropriate tuning can lead to unstable learning or suboptimal outcomes. These findings provide practical insights into the strengths and limitations of applying RND within value-based reinforcement learning frameworks.