Multi-Agent Reinforcement Learning for Swarm Planetary Exploration

None, None; None, None

Multi-Agent Reinforcement Learning for Swarm Planetary Exploration

Conference Paper (2026)

Author(s)

A. Menor de Oñate (Student TU Delft)

E. van Kampen (TU Delft - Aerospace Engineering)

Research Group

Control & Simulation

DOI related publication

https://doi.org/10.2514/6.2026-0131 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:c97e7a67-dcad-4114-b7b6-02c1c4150251

More Info

expand_more

Publication Year

2026

Language

English

Research Group

Control & Simulation

Article number

AIAA 2026-0131

Publisher

American Institute of Aeronautics and Astronautics Inc. (AIAA)

ISBN (print)

9781624107658

ISBN (electronic)

978-1-62410-765-8

Event

AIAA SCITECH 2026 Forum (2026-01-12 - 2026-01-16), Orlando, United States

Downloads counter

186

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Exploring planetary bodies using robot swarms can potentially increase the value of the exploration missions; enabling the execution of novel measurements and explorations previously deemed impractical or unattainable. Despite its potential, the technology readiness level of planetary swarms is not very mature. This work uses multi-agent reinforcement learning to find control policies that allow swarms to autonomously explore unknown areas in a decentralized manner, contributing towards the technology readiness of the field. A multi-agent proximal policy optimization (MAPPO) algorithm is proposed for this end, where the policy uses LIDAR perception information, and the input of the value function contains local and global environment information. The algorithm finds control policies that achieve cooperation behaviors and generalize to unseen swarm sizes and environments learning with simple, sparse reward functions. Moreover, different types of reward functions, value inputs, and environment configurations are investigated.

Files

Menor-de-o_ate-van-kampen-2026... (pdf)

(pdf | 4.19 Mb)

License info not available