Detecting Environment Changes via Quantile Spread in Quantile Regression Deep-Q Networks

None, None

Detecting Environment Changes via Quantile Spread in Quantile Regression Deep-Q Networks

Bachelor Thesis (2025)

Author(s)

P. Stan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M.M. Celikok – Mentor (TU Delft - Sequential Decision Making)

FA Oliehoek – Mentor (TU Delft - Sequential Decision Making)

Annibale Panichella – Graduation committee member (TU Delft - Software Engineering)

Faculty

Electrical Engineering, Mathematics and Computer Science

Reinforcement Learning Quantile Regression Quantile Spread

To reference this document use:

https://resolver.tudelft.nl/uuid:6581ed76-7685-4ff1-885f-4205e41139cf

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

27-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Reinforcement learning agents are trained in well-defined environments and evaluated under the assumption that the test time conditions match those encountered during training. However, even small changes in the environment’s dynamics can degrade the policy’s performance, even more so in safety critical domains. This work investigates the use of Quantile Regression Deep-Q Networks (QR-DQN) to detect environment shifts by analyzing the uncertainty in return predictions. QR-DQN extends Deep-Q learning by estimating the entire distribution of future returns through quantile regression. We hypothesize that under deterministic settings, the spread of the return distribution, quantified by the inter-quantile range, can determine whether environmental changes took place. The RL agent learns low spread predictions for familiar dynamics, but when deployed in changed environments, the quantile distribution becomes wider. We conduct experiments on the deterministic CartPole-v1 environment by the pole length. We show that the quantile spread is low under small changes, but drastically increases as the dynamics’ shifts diverge more from the training setting. Our results indicate the potential of distributional reinforcement learning to enhance reliability and awareness in deployment scenarios.

Files

Research_paper-5.pdf

(pdf | 0.457 Mb)

License info not available