Evaluating policy learning methods on the powered descent rocket landing problem

None, None

Evaluating policy learning methods on the powered descent rocket landing problem

Learning feasible policies for powered descent

Master Thesis (2025)

Author(s)

J.R. van Zyl (TU Delft - Aerospace Engineering)

Contributor(s)

Erik-Jan van Van Kampen – Mentor (TU Delft - Control & Simulation)

Faculty

Aerospace Engineering

Reinforcement Learning Neuroevolution

To reference this document use:

https://resolver.tudelft.nl/uuid:fbaf2a8c-5212-4e4d-ab20-a6af85664e7f

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

08-07-2025

Awarding Institution

Delft University of Technology

Programme

['Aerospace Engineering']

Faculty

Aerospace Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This study explored the application of particle subswarm optimisation - a non-gradient-based swarming intelligence method - in neuroevolution for solving the powered descent landing problem. It was compared to a gradient-based Soft Actor-Critic (SAC) reinforcement learning algorithm under sparse and non-sparse reward settings. A simulation environment was developed incorporating aerodynamic modelling, grid fins, thrust vector control, and variable inertia. Results show that a Soft Actor Critic struggled with non-sparse rewards due to its inability to generalise across the long-horizon powered descent task due to non-informative gradients from high-variance state-action values. In contrast, learning with the sparse rewards became trapped in a local optimum due to a weak reward signal. Particle subswarm optimisation achieved rapid convergence to feasible solutions and reduced propellant consumption, outperforming SAC due to an episodic gated reward structure, local minima avoidance, and gradient independence. This study establishes particle subswarm optimisation neuroevolution as a viable alternative to reinforcement learning for the assignment of throttle during powered descent of a launch vehicle's first stage.

Files

EvaluatingPolicyLearningMethod... (pdf)

(pdf | 0 Mb)

License info not available

File under embargo until 25-06-2026