Training and Transferring Safe Policies in Reinforcement Learning

None, None; None, None; None, None; None, None; None, None

Training and Transferring Safe Policies in Reinforcement Learning

Conference Paper (2022)

Author(s)

Qisong Yang (TU Delft - Algorithmics)

Thiago D. Simão (TU Delft - Algorithmics)

Nils Jansen (Radboud Universiteit Nijmegen)

Simon H. Tindemans (TU Delft - Intelligent Electrical Power Grids)

M.T.J. Spaan (TU Delft - Algorithmics)

Research Group

Algorithmics

Copyright

To reference this document use:

https://resolver.tudelft.nl/uuid:ac7a0134-5b81-4019-baaf-351427397d3d

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Research Group

Algorithmics

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Safety is critical to broadening the a lication of reinforcement learning (RL). Often, RL agents are trained in a controlled environment, such as a laboratory, before being de loyed in the real world. However, the target reward might be unknown rior to de loyment. Reward-free RL addresses this roblem by training an agent without the reward to ada t quickly once the reward is revealed.
We consider the constrained reward-free setting, where an agent (the guide) learns to ex lore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still rovides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to com ose a safe sam ling olicy. Drawing from transfer learning, we also regularize a target olicy (the student)
towards the guide while the student is unreliable and gradually eliminate the influence from the guide as training rogresses. The em irical analysis shows that this method can achieve safe transfer learning and hel s the student solve the target task faster.

Files

ALA2022_paper_34.pdf

(pdf | 9.38 Mb)

License info not available