Spiking Neural Networks for High-Speed Continuous Quadcopter Control Using Proximal Policy Optimization

None, None

Spiking Neural Networks for High-Speed Continuous Quadcopter Control Using Proximal Policy Optimization

Toward Energy-Efficient Neuromorphic Control of Agile Drones

Master Thesis (2025)

Author(s)

M.F. van Breukelen (TU Delft - Aerospace Engineering)

Contributor(s)

C. de Wagter – Mentor (TU Delft - Aerospace Engineering)

G.C.H.E. de Croon – Graduation committee member (TU Delft - Aerospace Engineering)

R. Ferede – Graduation committee member (TU Delft - Aerospace Engineering)

R.W. Vos – Graduation committee member (TU Delft - Aerospace Engineering)

E.J.O. Schrama – Graduation committee member (TU Delft - Aerospace Engineering)

Faculty

Aerospace Engineering

Reinforcement Learning Artificial Neural Networks Neuromorphic Computing Drones Spiking Neural Network (SNN) Artificial Inteligence (AI)

To reference this document use

https://resolver.tudelft.nl/uuid:57263001-b839-4624-a23b-67cafb8a8441

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

19-09-2025

Awarding Institution

Delft University of Technology

Programme

Aerospace Engineering

Faculty

Aerospace Engineering

Downloads counter

142

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

We present the first demonstration of a fully spiking actor-critic neural network policy, trained via Proximal Policy Optimization (PPO), for continuous control of an agile high-speed quadcopter in a gate-based navigation task. The spiking neural network (SNN) controller employs Leaky Integrate-and-Fire neurons with surrogate gradient training and spike-rate decoding over multiple integration cycles, and it is benchmarked against a comparable artificial neural network (ANN) controller in both simulation and real-world flight tests. Results show that despite being trained to the same reward level, the SNN achieves superior performance in simulation, achieving higher episode rewards, greater robustness and reduced crash rate. Additionally, in 12-second real-world trials, the SNN outperforms the ANN, attaining a higher average reward (70.63 vs 59.77), greater mean velocity (7.94 vs 6.99 m/s), and more gates cleared (46.33 vs 40.67). An analysis of the spike integration cycle count reveals a clear trade-off: lower cycle counts (fewer integration steps per control update) reduce control output resolution and hinder learning, whereas higher cycle counts improve smoothness but increase inference latency. Moderate cycle counts (5 or 8) provide the best balance, yielding high rewards, smoother outputs, and low execution time overhead. These findings represent a key step forward for neuromorphic control in embedded autonomous systems, demonstrating that SNN-based policies can outperform conventional ANN controllers in high-speed, agile robotic tasks.

Files

Master_Thesis_Spiking_PPO-4.pd... (pdf)

(pdf | 6.1 Mb)

License info not available