WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

Conference Paper (2021)
Author(s)

Qisong Yang (TU Delft - Algorithmics)

Thiago D. Simão (TU Delft - Algorithmics)

Simon H. Tindemans (TU Delft - Intelligent Electrical Power Grids)

Matthijs T.J. Spaan (TU Delft - Algorithmics)

Research Group
Algorithmics
Copyright
© 2021 Q. Yang, T. D. Simão, Simon H. Tindemans, M.T.J. Spaan
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Q. Yang, T. D. Simão, Simon H. Tindemans, M.T.J. Spaan
Research Group
Algorithmics
Pages (from-to)
10639-10646
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Safe exploration is regarded as a key priority area for reinforcement learning research. With separate reward and safety signals, it is natural to cast it as constrained reinforcement learning, where expected long-term costs of policies are constrained. However, it can be hazardous to set constraints on the expected safety signal without considering the tail of the distribution. For instance, in safety-critical domains, worst-case analysis is required to avoid disastrous results. We present a novel reinforcement learning algorithm called Worst-Case Soft Actor Critic, which extends the Soft Actor Critic algorithm with a safety critic to achieve risk control. More specifically, a certain level of conditional Value-at- Risk from the distribution is regarded as a safety measure to judge the constraint satisfaction, which guides the change of adaptive safety weights to achieve a trade-off between reward and safety. As a result, we can optimize policies under the premise that their worst-case performance satisfies the constraints. The empirical analysis shows that our algorithm attains better risk control compared to expectation-based methods.

Files

17272_Article_Text_20766_1_2_2... (pdf)
(pdf | 3.4 Mb)
- Embargo expired in 15-11-2021
License info not available