Conventional and Reinforcement Learning Control of MXER Tether Dynamics for Extended Payload Rendezvous

Conference Paper (2025)
Author(s)

Zander du Toit (Student TU Delft)

Marc Naeije (TU Delft - Astrodynamics & Space Missions)

Research Group
Astrodynamics & Space Missions
DOI related publication
https://doi.org/10.52202/083094-0004 Final published version
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Astrodynamics & Space Missions
Pages (from-to)
31-47
Publisher
International Astronautical Federation, IAF
ISBN (electronic)
9798331329426
Event
23rd IAA Symposium on Visions and Strategies for the Future at the 76th International Astronautical Congress, IAC 2025 (2025-09-29 - 2025-10-03), Sydney, Australia
Downloads counter
2

Abstract

Momentum Exchange with Electrodynamic Reboost (MXER) tethers transfer captured payloads to higher orbits using a long, rotating tether. This transfer occurs through a momentum exchange from the tether to the payload, after which the tether's orbital energy is restored via electrodynamic thrusting. MXER tethers offer a sustainable, reusable, and near-propellantless alternative to rockets for orbital and interplanetary transfer of payloads. However, the short rendezvous window for tether payload capture, typically lasting mere seconds, presents a significant challenge to the use of these tether systems. This research investigates the control of MXER tether dynamics, aiming to improve payload capture success by extending the rendezvous window. This work compares three actuator configurations (a baseline tip-reeling system, a climbing actuator mass, and a reeling actuator mass) previously studied for librating tethers, adapting them for a rotating MXER system based on the Cislunar Tether Transport System design. A 2D rigid-body model is used to simulate the system dynamics. Initially, a conventional iterative Linear Quadratic Regulator (iLQR) establishes a baseline for control performance. Subsequently, the model-free Soft Actor-Critic (SAC) Deep Reinforcement Learning (RL) algorithm is implemented and trained. Both control methods were tested with and without dynamic system constraints. The performance of each configuration is evaluated based on rendezvous window extension and constraint satisfaction. In the unconstrained case, the reeler configuration is shown to be the most effective, extending the rendezvous window to 1.8 seconds from the 0.6 seconds for the uncontrolled case. The SAC RL algorithm matches the performance of the tuned iLQR controller, but produces a less smooth control policy with sporadic actuator use. The constrained control proved more challenging, with neither the augmented-Lagrangian iLQR nor the SAC-based controller managing to extend the rendezvous window; the former was overly conservative, while the latter failed to satisfy operational constraints.