Learning scalable and efficient communication policies for multi-robot collision avoidance

None, None; None, None; None, None; None, None; None, None

Learning scalable and efficient communication policies for multi-robot collision avoidance

Journal Article (2023)

Author(s)

Álvaro Serra-Gómez (TU Delft - Mechanical Engineering)

Hai Zhu (TU Delft - Mechanical Engineering)

B.F. Ferreira de Brito (TU Delft - Mechanical Engineering)

Wendelin Böhmer (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Javier Alonso-Mora (TU Delft - Mechanical Engineering)

Research Group

Learning & Autonomous Control

Multi-agent reinforcement learning Collision avoidance Multi-robot systems Aerial robots Multi-robot communication

DOI related publication

https://doi.org/10.1007/s10514-023-10127-3 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:d0853c0f-0c0f-4ae6-81cf-66ad45f8eb67

More Info

expand_more

Publication Year

2023

Language

English

Research Group

Learning & Autonomous Control

Journal title

Autonomous Robots

Issue number

8

Volume number

47

Pages (from-to)

1275-1297

Downloads counter

374

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Decentralized multi-robot systems typically perform coordinated motion planning by constantly broadcasting their intentions to avoid collisions. However, the risk of collision between robots varies as they move and communication may not always be needed. This paper presents an efficient communication method that addresses the problem of “when” and “with whom” to communicate in multi-robot collision avoidance scenarios. In this approach, each robot learns to reason about other robots’ states and considers the risk of future collisions before asking for the trajectory plans of other robots. We introduce a new neural architecture for the learned communication policy which allows our method to be scalable. We evaluate and verify the proposed communication strategy in simulation with up to twelve quadrotors, and present results on the zero-shot generalization/robustness capabilities of the policy in different scenarios. We demonstrate that our policy (learned in a simulated environment) can be successfully transferred to real robots.

Files

S10514_023_10127_3.pdf

(pdf | 2.8 Mb)