Learning-Based Multi-UAV Flocking Control With Limited Visual Field and Instinctive Repulsion

Journal Article (2024)
Author(s)

Chengchao Bai (Harbin Institute of Technology)

Peng Yan (Harbin Institute of Technology)

Haiyin Piao (SADRI Institute, Northwestern Polytechnical University)

Wei Pan (TU Delft - Robot Dynamics, The University of Manchester)

Jifeng Guo (Harbin Institute of Technology)

Research Group
Robot Dynamics
Copyright
© 2024 Chengchao Bai, Peng Yan, Haiyin Piao, W. Pan, Jifeng Guo
DOI related publication
https://doi.org/10.1109/TCYB.2023.3246985
More Info
expand_more
Publication Year
2024
Language
English
Copyright
© 2024 Chengchao Bai, Peng Yan, Haiyin Piao, W. Pan, Jifeng Guo
Research Group
Robot Dynamics
Issue number
1
Volume number
54
Pages (from-to)
462-475
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This article explores deep reinforcement learning (DRL) for the flocking control of unmanned aerial vehicle (UAV) swarms. The flocking control policy is trained using a centralized-learning-decentralized-execution (CTDE) paradigm, where a centralized critic network augmented with additional information about the entire UAV swarm is utilized to improve learning efficiency. Instead of learning inter-UAV collision avoidance capabilities, a repulsion function is encoded as an inner-UAV 'instinct.' In addition, the UAVs can obtain the states of other UAVs through onboard sensors in communication-denied environments, and the impact of varying visual fields on flocking control is analyzed. Through extensive simulations, it is shown that the proposed policy with the repulsion function and limited visual field has a success rate of 93.8% in training environments, 85.6% in environments with a high number of UAVs, 91.2% in environments with a high number of obstacles, and 82.2% in environments with dynamic obstacles. Furthermore, the results indicate that the proposed learning-based methods are more suitable than traditional methods in cluttered environments.

Files

Learning_Based_Multi_UAV_Flock... (pdf)
(pdf | 7.02 Mb)
- Embargo expired in 08-09-2023
License info not available