Spiking Neural Networks for High-Speed Continuous Quadcopter Control Using Proximal Policy Optimization
Toward Energy-Efficient Neuromorphic Control of Agile Drones
M.F. van Breukelen (TU Delft - Aerospace Engineering)
Christophe De de Wagter – Mentor (TU Delft - Control & Simulation)
Guido C.H.E.de de Croon – Graduation committee member (TU Delft - Control & Simulation)
R. Ferede – Graduation committee member (TU Delft - Control & Simulation)
R.W. Vos – Graduation committee member (TU Delft - Control & Simulation)
E. J.O. Schrama – Graduation committee member (TU Delft - Astrodynamics & Space Missions)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
We present the first demonstration of a fully spiking actor-critic neural network policy, trained via Proximal Policy Optimization (PPO), for continuous control of an agile high-speed quadcopter in a gate-based navigation task. The spiking neural network (SNN) controller employs Leaky Integrate-and-Fire neurons with surrogate gradient training and spike-rate decoding over multiple integration cycles, and it is benchmarked against a comparable artificial neural network (ANN) controller in both simulation and real-world flight tests. Results show that despite being trained to the same reward level, the SNN achieves superior performance in simulation, achieving higher episode rewards, greater robustness and reduced crash rate. Additionally, in 12-second real-world trials, the SNN outperforms the ANN, attaining a higher average reward (70.63 vs 59.77), greater mean velocity (7.94 vs 6.99 m/s), and more gates cleared (46.33 vs 40.67). An analysis of the spike integration cycle count reveals a clear trade-off: lower cycle counts (fewer integration steps per control update) reduce control output resolution and hinder learning, whereas higher cycle counts improve smoothness but increase inference latency. Moderate cycle counts (5 or 8) provide the best balance, yielding high rewards, smoother outputs, and low execution time overhead. These findings represent a key step forward for neuromorphic control in embedded autonomous systems, demonstrating that SNN-based policies can outperform conventional ANN controllers in high-speed, agile robotic tasks.