Evaluating the robustness of DQN and QR-DQN under domain randomization
Analyzing the effects of domain variation on value-based methods
Y. Zwetsloot (TU Delft - Electrical Engineering, Mathematics and Computer Science)
M.M. Celikok – Mentor (TU Delft - Sequential Decision Making)
FA Oliehoek – Mentor (TU Delft - Sequential Decision Making)
Annibale Panichella – Graduation committee member (TU Delft - Software Engineering)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Domain randomization (or DR) is a widely used technique in reinforcement learning to improve robustness and enable sim-to-real transfer. While prior work has focused extensively on DR in combination with algorithms such as PPO and SAC, its effects on value-based methods like DQN and QR-DQN remain underexplored. This paper investigates how varying degrees and types of DR affect the robustness and generalization capabilities of agents trained with DQN and QR-DQN. We identify clear differences in how DQN and QR-DQN respond to domain randomization, suggesting that naive application may hinder performance, whereas well-targeted distributions can enhance robustness and generalization. These findings underscore the importance of tailored DR strategies for different algorithms and contribute to a deeper understanding of DR’s role in DQN-based methods.