Neural combinatorial optimization for multi-rendezvous mission design

None, None; None, None

Neural combinatorial optimization for multi-rendezvous mission design

Journal Article (2025)

Author(s)

Antonio López Rivera (The Exploration Company, AOCS-GNC)

MC Naeije (TU Delft - Astrodynamics & Space Missions)

Astrodynamics & Space Missions

DOI related publication

https://doi.org/10.1016/j.asr.2025.03.050

Reinforcement learning Trajectory optimization Active debris removal Debris mitigation Graph attention networks

To reference this document use:

https://resolver.tudelft.nl/uuid:eb18e4fa-b8fa-4473-b3d2-1adfd35f3d8e

More Info

expand_more

Publication Year

2025

Language

English

Astrodynamics & Space Missions

Issue number

10

Volume number

75

Pages (from-to)

7306-7326

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Optimal solutions to spacecraft routing problems are essential for space logistics activity such as Active Debris Removal (ADR), which addresses the growing threat of space debris. This research investigates the effectiveness of Neural Combinatorial Optimization (NCO) methods for the autonomous planning of low-thrust, multi-target ADR missions, an instance of the Space Traveling Salesman Problem (STSP). An autoregressive, attention-based routing policy was trained to solve 10-transfer ADR routing problems using REINFORCE, Advantage Actor-Critic, and Proximal Policy Optimization. A hyperparameter sensitivity analysis identified embedding dimension and the number of encoder layers as the critical factors influencing model performance, while an ablation study found the attention-based encoder to be the most critical architectural component of the policy. The trained policy was evaluated on 10-, 30-, and 50-transfer scenarios based on the Iridium 33 debris cloud, comparing its performance to a baseline provided by a novel ADR STSP routing heuristic (Dynamic RAAN Walk, DRW) and near-optimal benchmarks obtained via Heuristic Combinatorial Optimization (HCO). In missions with 10 transfers, the NCO policy achieved a mean optimality gap of 32%, outperforming DRW. However, performance degraded significantly in scenarios with 30 and 50 transfers, suggesting limited generalization to larger problems. A hyperparameter search further revealed that the performance of the NCO model considered in this work improves asymptotically with its size. Exposure to greater numbers of training scenarios did not yield significant performance gains. This work demonstrates that NCO methods can be effective for the autonomous planning of ADR missions with a limited number of targets, but face scalability and generalization challenges in more complex scenarios.

Files

1-s2.0-S0273117725002893-main.... (pdf)

(pdf | 3.65 Mb)