This study explored the application of particle subswarm optimisation - a non-gradient-based swarming intelligence method - in neuroevolution for solving the powered descent landing problem. It was compared to a gradient-based Soft Actor-Critic (SAC) reinforcement learning algori
...
This study explored the application of particle subswarm optimisation - a non-gradient-based swarming intelligence method - in neuroevolution for solving the powered descent landing problem. It was compared to a gradient-based Soft Actor-Critic (SAC) reinforcement learning algorithm under sparse and non-sparse reward settings. A simulation environment was developed incorporating aerodynamic modelling, grid fins, thrust vector control, and variable inertia. Results show that a Soft Actor Critic struggled with non-sparse rewards due to its inability to generalise across the long-horizon powered descent task due to non-informative gradients from high-variance state-action values. In contrast, learning with the sparse rewards became trapped in a local optimum due to a weak reward signal. Particle subswarm optimisation achieved rapid convergence to feasible solutions and reduced propellant consumption, outperforming SAC due to an episodic gated reward structure, local minima avoidance, and gradient independence. This study establishes particle subswarm optimisation neuroevolution as a viable alternative to reinforcement learning for the assignment of throttle during powered descent of a launch vehicle's first stage.