Bayesian quadrature policy optimization for spacecraft proximity maneuvers and docking

None, None; None, None; None, None; None, None; None, None; None, None

Bayesian quadrature policy optimization for spacecraft proximity maneuvers and docking

Journal Article (2024)

Author(s)

Desong Du (Harbin Institute of Technology, TU Delft - Robust Robot Systems)

Yanfang Liu (Harbin Institute of Technology)

Ouyang Zhang (Harbin Institute of Technology)

Naiming Qi (Harbin Institute of Technology)

Weiran Yao (Harbin Institute of Technology)

Wei Pan (TU Delft - Robust Robot Systems, The University of Manchester)

Reinforcement learning Bayesian quadrature policy optimization Proximity maneuvers and docking

DOI related publication

https://doi.org/10.1016/j.ast.2024.109474 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:fa210503-6dae-4ae6-a861-cf55bf06eda2

More Info

expand_more

Publication Year

2024

Language

English

Journal title

Aerospace Science and Technology

Volume number

154

Article number

109474

Downloads counter

195

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Advancing autonomous spacecraft proximity maneuvers and docking (PMD) is crucial for enhancing the efficiency and safety of inter-satellite services. One primary challenge in PMD is the accurate a priori definition of the system model, often complicated by inherent uncertainties in the system modeling and observational data. To address this challenge, we propose a novel Lyapunov Bayesian actor-critic reinforcement learning algorithm that guarantees the stability of the control policy under uncertainty. The PMD task is formulated as a Markov decision process that involves the relative dynamic model, the docking cone, and the cost function. By applying Lyapunov theory, we reformulate temporal difference learning as a constrained Gaussian process regression, enabling the state-value function to act as a Lyapunov function. Additionally, the proposed Bayesian quadrature policy optimization method analytically computes policy gradients, effectively addressing stability constraints while accommodating informational uncertainties in the PMD task. Experimental validation on a spacecraft air-bearing testbed demonstrates the significant and promising performance of the proposed algorithm.

Files

1-s2.0-S1270963824006059-main.... (pdf)

(pdf | 3.31 Mb)