Spiking Neural-Networks-Based Data-Driven Control

None, None; None, None

Spiking Neural-Networks-Based Data-Driven Control

Journal Article (2023)

Author(s)

Y. Liu (Student TU Delft)

W. Pan (TU Delft - Robot Dynamics)

Research Group

Robot Dynamics

Copyright

DOI related publication

https://doi.org/10.3390/electronics1202031

Control Reinforcement learning Spiking neural network

To reference this document use:

https://resolver.tudelft.nl/uuid:b303ac20-37a6-4722-83d1-90874e97f378

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Research Group

Robot Dynamics

Issue number

2

Volume number

12

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Machine learning can be effectively applied in control loops to make optimal control decisions robustly. There is increasing interest in using spiking neural networks (SNNs) as the apparatus for machine learning in control engineering because SNNs can potentially offer high energy efficiency, and new SNN-enabling neuromorphic hardware is being rapidly developed. A defining characteristic of control problems is that environmental reactions and delayed rewards must be considered. Although reinforcement learning (RL) provides the fundamental mechanisms to address such problems, implementing these mechanisms in SNN learning has been underexplored. Previously, spike-timing-dependent plasticity learning schemes (STDP) modulated by factors of temporal difference (TD-STDP) or reward (R-STDP) have been proposed for RL with SNN. Here, we designed and implemented an SNN controller to explore and compare these two schemes by considering cart-pole balancing as a representative example. Although the TD-based learning rules are very general, the resulting model exhibits rather slow convergence, producing noisy and imperfect results even after prolonged training. We show that by integrating the understanding of the dynamics of the environment into the reward function of R-STDP, a robust SNN-based controller can be learned much more efficiently than TD-STDP.

Files

Electronics_12_00310.pdf

(pdf | 2.9 Mb)