Safe Adaptive Policy Transfer Reinforcement Learning for Distributed Multiagent Control

Journal Article (2025)
Author(s)

Bin Du (Shanghai Jiao Tong University, Northwestern Polytechnical University)

Wei Xie (Shanghai Jiao Tong University)

Yang Li (Hunan University)

Qisong Yang (Xi'an Institute of High-Technology)

Weidong Zhang (Shanghai Jiao Tong University)

R. Negenborn (TU Delft - Transport Engineering and Logistics)

Y Pang (TU Delft - Transport Engineering and Logistics)

Hongtian Chen (Shanghai Jiao Tong University)

Research Group
Transport Engineering and Logistics
DOI related publication
https://doi.org/10.1109/TNNLS.2023.3326867
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Transport Engineering and Logistics
Bibliographical Note
Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.@en
Issue number
1
Volume number
36
Pages (from-to)
1939-1946
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Multiagent reinforcement learning (RL) training is usually difficult and time-consuming due to mutual interference among agents. Safety concerns make an already difficult training process even harder. This study proposes a safe adaptive policy transfer RL approach for multiagent cooperative control. Specifically, a pioneer and follower off-policy policy transfer learning (PFOPT) method is presented to help follower agents acquire knowledge and experience from a single well-trained pioneer agent. Notably, the designed approach can transfer both the policy representation and sample experience provided by the pioneer policy in the off-policy learning. More importantly, the proposed method can adaptively adjust the learning weight of prior experience and exploration according to the Wasserstein distance between the policy probability distributions of the pioneer and the follower. Case studies show that the distributed agents trained by the proposed method can complete a collaborative task and acquire the maximum rewards while minimizing the violation of constraints. Moreover, the proposed method can also achieve satisfactory performance in terms of learning speed and success rate.

Files

Safe_Adaptive_Policy_Transfer_... (pdf)
(pdf | 11.3 Mb)
- Embargo expired in 02-05-2025
License info not available