Evaluating policy learning methods on the powered descent rocket landing problem
Learning feasible policies for powered descent
J.R. van Zyl (TU Delft - Aerospace Engineering)
Erik-Jan van Kampen – Mentor (TU Delft - Control & Simulation)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
This study explored the application of particle subswarm optimisation - a non-gradient-based swarming intelligence method - in neuroevolution for solving the powered descent landing problem. It was compared to a gradient-based Soft Actor-Critic (SAC) reinforcement learning algorithm under sparse and non-sparse reward settings. A simulation environment was developed incorporating aerodynamic modelling, grid fins, thrust vector control, and variable inertia. Results show that a Soft Actor Critic struggled with non-sparse rewards due to its inability to generalise across the long-horizon powered descent task due to non-informative gradients from high-variance state-action values. In contrast, learning with the sparse rewards became trapped in a local optimum due to a weak reward signal. Particle subswarm optimisation achieved rapid convergence to feasible solutions and reduced propellant consumption, outperforming SAC due to an episodic gated reward structure, local minima avoidance, and gradient independence. This study establishes particle subswarm optimisation neuroevolution as a viable alternative to reinforcement learning for the assignment of throttle during powered descent of a launch vehicle's first stage.
Files
File under embargo until 25-06-2026