Decentralized Real-Time Planning for Multi-UAV Cooperative Manipulation via Imitation Learning

None, None

Decentralized Real-Time Planning for Multi-UAV Cooperative Manipulation via Imitation Learning

Master Thesis (2025)

Author(s)

S. Agarwal (TU Delft - Mechanical Engineering)

Contributor(s)

S. Sun – Mentor (TU Delft - Learning & Autonomous Control)

J. Alonso-Mora – Mentor (TU Delft - Learning & Autonomous Control)

J. Kober – Graduation committee member (TU Delft - Learning & Autonomous Control)

J.W. Böhmer – Graduation committee member (TU Delft - Sequential Decision Making)

Faculty

Mechanical Engineering

Deep Learning Imitation Learning Control theory Physics Informed Neural Networks NMPC Multi-Agent Reinforcement Learning

To reference this document use:

https://resolver.tudelft.nl/uuid:39d40ea4-a4c3-4c8d-8ab4-81730dca3be3

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

28-08-2025

Awarding Institution

Delft University of Technology

Programme

['Mechanical Engineering | Vehicle Engineering | Cognitive Robotics']

Faculty

Mechanical Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Collaborative transportation and manipulation of cable-suspended loads by multiple UAVs offer a promising way for expanding UAVs’ role in heavy-lifting operations. Existing approaches for collaborative aerial manipulation of a payload along a reference trajectory typically rely either on centralized control architectures or on reliable inter-agent communication. In this work, we propose a novel machine learning–based method for decentralized kinodynamic planning that operates effectively under partial observability and without inter-agent communication. Our method leverages imitation learning to train a decentralized homogenous student policy for each UAV by imitating a centralized kinodynamic motion planner which has access to privileged global observations. The student policy uses physics-informed neural networks that respect the derivative relationships of motion to generate trajectory that are step-wise consistent and guaranteed to be kinematically feasible. During training, the student policies utilize the full trajectory generated by the teacher policy, leading to improved sample efficiency. Therefore, the student policy can be trained in under two hours on modest hardware. We validate our method in both simulation and real-world environments to follow an agile reference trajectory, demonstrating performance comparable to that of centralized approaches.

Files

Thesis_ShantnavAgarwal_5939933... (pdf)

(pdf | 9.59 Mb)

License info not available