Refined Risk Management in Safe Reinforcement Learning with a Distributional Safety Critic

None, None; None, None; None, None; None, None

Refined Risk Management in Safe Reinforcement Learning with a Distributional Safety Critic

Conference Paper (2022)

Author(s)

Q. Yang (TU Delft - Algorithmics)

T. D. Simão (TU Delft - Algorithmics)

Simon H. Tindemans (TU Delft - Intelligent Electrical Power Grids)

M.T.J. Spaan (TU Delft - Algorithmics)

Research Group

Algorithmics

To reference this document use:

https://resolver.tudelft.nl/uuid:7cfa2a33-6ad7-458a-827d-06e9d57a969c

More Info

expand_more

Publication Year

2022

Language

English

Research Group

Algorithmics

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Safety is critical to broadening the real-world use of reinforcement learning (RL). Modeling the safety aspects using a safety-cost signal separate from the reward is becoming standard practice, since it avoids the problem of finding a good balance between safety and performance. However, the total safety-cost distribution of different trajectories is still largely unexplored. In this paper, we propose an actor critic method for safe RL that uses an implicit quantile network to approximate the distribution of accumulated safety-costs. Using an accurate estimate of the distribution of accumulated safetycosts, in particular of the upper tail of the distribution, greatly improves the performance of riskaverse RL agents. The empirical analysis shows that our method achieves good risk control in complex safety-constrained environments.

Files

SafeRL2022_RefinedRiskManageme... (pdf)

(pdf | 1.83 Mb)

License info not available