Evaluating the Robustness of DQN and QR-DQN in Traffic Simulation

Analyzing the Effect of Quantile Manipulation in Environmental Variability

Bachelor Thesis (2025)
Author(s)

C. Toadere (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M.M. Celikok – Mentor (TU Delft - Sequential Decision Making)

FA Oliehoek – Graduation committee member (TU Delft - Sequential Decision Making)

Annibale Panichella – Graduation committee member (TU Delft - Software Engineering)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
25-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

As autonomous driving systems advance, ensuring the robustness of underlying decision-making algorithms becomes increasingly critical. This study assesses the performance and reliability of two reinforcement learning models, Deep Q-Network (DQN) and Quantile Regression DQN (QR-DQN), within the context of a simulated highway environment. While DQN has been widely adopted for its simplicity and effectiveness in discrete action spaces, it suffers from overestimation bias and lack of performance in out-of-distribution environments. QR-DQN addresses some of these limitations by modeling the distribution over returns using quantile regression, offering a superior representation of uncertainty. This research focuses on two core objectives: (1) implementing a riskaverse decision-making strategy using the quantiles of QR-DQN to enhance safety and reliability, and (2) evaluating the robustness of DQN and QR-DQN as the test environment deviates from training conditions. Results show the limitations of DQN and demonstrate QR-DQN’s higher robustness in different environments. Moreover, a better performing alternative of QR-DQN is presented, employing a conservative behaviour through the use of its quantiles. This puts emphasis on the implemented model’s trade-off between maximising rewards and avoiding collisions, providing a safer approach.

Files

License info not available