Detecting Environment Changes via Quantile Spread in Quantile Regression Deep-Q Networks

Bachelor Thesis (2025)
Author(s)

P. Stan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M.M. Celikok – Mentor (TU Delft - Sequential Decision Making)

FA Oliehoek – Mentor (TU Delft - Sequential Decision Making)

Annibale Panichella – Graduation committee member (TU Delft - Software Engineering)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
27-06-2025
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Reinforcement learning agents are trained in well-defined environments and evaluated under the assumption that the test time conditions match those encountered during training. However, even small changes in the environment’s dynamics can degrade the policy’s performance, even more so in safety critical domains. This work investigates the use of Quantile Regression Deep-Q Networks (QR-DQN) to detect environment shifts by analyzing the uncertainty in return predictions. QR-DQN extends Deep-Q learning by estimating the entire distribution of future returns through quantile regression. We hypothesize that under deterministic settings, the spread of the return distribution, quantified by the inter-quantile range, can determine whether environmental changes took place. The RL agent learns low spread predictions for familiar dynamics, but when deployed in changed environments, the quantile distribution becomes wider. We conduct experiments on the deterministic CartPole-v1 environment by the pole length. We show that the quantile spread is low under small changes, but drastically increases as the dynamics’ shifts diverge more from the training setting. Our results indicate the potential of distributional reinforcement learning to enhance reliability and awareness in deployment scenarios.

Files

Research_paper-5.pdf
(pdf | 0.457 Mb)
License info not available