Training and Transferring Safe Policies in Reinforcement Learning

Conference Paper (2022)
Author(s)

Qisong Yang (TU Delft - Algorithmics)

Thiago D. Simão (TU Delft - Algorithmics)

Nils Jansen (Radboud Universiteit Nijmegen)

Simon H. Tindemans (TU Delft - Intelligent Electrical Power Grids)

M.T.J. Spaan (TU Delft - Algorithmics)

Research Group
Algorithmics
Copyright
© 2022 Q. Yang, T. D. Simão, Nils Jansen, Simon H. Tindemans, M.T.J. Spaan
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Q. Yang, T. D. Simão, Nils Jansen, Simon H. Tindemans, M.T.J. Spaan
Research Group
Algorithmics
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Safety is critical to broadening the a lication of reinforcement learning (RL). Often, RL agents are trained in a controlled environment, such as a laboratory, before being de loyed in the real world. However, the target reward might be unknown rior to de loyment. Reward-free RL addresses this roblem by training an agent without the reward to ada t quickly once the reward is revealed.
We consider the constrained reward-free setting, where an agent (the guide) learns to ex lore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still rovides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to com ose a safe sam ling olicy. Drawing from transfer learning, we also regularize a target olicy (the student)
towards the guide while the student is unreliable and gradually eliminate the influence from the guide as training rogresses. The em irical analysis shows that this method can achieve safe transfer learning and hel s the student solve the target task faster.

Files

ALA2022_paper_34.pdf
(pdf | 9.38 Mb)
License info not available