Bayesian quadrature policy optimization for spacecraft proximity maneuvers and docking

Journal Article (2024)
Author(s)

Desong Du (Harbin Institute of Technology, TU Delft - Robust Robot Systems)

Yanfang Liu (Harbin Institute of Technology)

Ouyang Zhang (Harbin Institute of Technology)

Naiming Qi (Harbin Institute of Technology)

Weiran Yao (Harbin Institute of Technology)

Wei Pan (TU Delft - Robust Robot Systems, The University of Manchester)

Research Group
Robust Robot Systems
DOI related publication
https://doi.org/10.1016/j.ast.2024.109474
More Info
expand_more
Publication Year
2024
Language
English
Research Group
Robust Robot Systems
Volume number
154
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Advancing autonomous spacecraft proximity maneuvers and docking (PMD) is crucial for enhancing the efficiency and safety of inter-satellite services. One primary challenge in PMD is the accurate a priori definition of the system model, often complicated by inherent uncertainties in the system modeling and observational data. To address this challenge, we propose a novel Lyapunov Bayesian actor-critic reinforcement learning algorithm that guarantees the stability of the control policy under uncertainty. The PMD task is formulated as a Markov decision process that involves the relative dynamic model, the docking cone, and the cost function. By applying Lyapunov theory, we reformulate temporal difference learning as a constrained Gaussian process regression, enabling the state-value function to act as a Lyapunov function. Additionally, the proposed Bayesian quadrature policy optimization method analytically computes policy gradients, effectively addressing stability constraints while accommodating informational uncertainties in the PMD task. Experimental validation on a spacecraft air-bearing testbed demonstrates the significant and promising performance of the proposed algorithm.