Enabling embodied intelligence in robotics presents several unique challenges. A first major concern is the need for energy efficiency, low latency, and strong temporal reasoning to facilitate effective real-world interaction. Neuromorphic computing has garnered attention as a po
...
Enabling embodied intelligence in robotics presents several unique challenges. A first major concern is the need for energy efficiency, low latency, and strong temporal reasoning to facilitate effective real-world interaction. Neuromorphic computing has garnered attention as a potential solution to these problems. Secondly, when using deep neural networks, it is hard to shape a learning signal, due to the goal oriented nature of robotics. Reinforcement learning (RL) poses itself as a framework to leverage goal-directed reward functions to create this learning signal.
A key challenge with recurrent and spiking neural networks trained via RL is achieving stable baseline performance, able to creating sequences long enough to stabilize hidden states. This stabilization is crucial for processing sequences that extend beyond the initial warm-up period of the temporal network. In this article, an online RL approach is proposed, enabling temporal training with minimal changes to existing online algorithms, introducing a secondary guiding policy whose sole objective is to prevent episode termination before the warm-up period is complete. This framework is demonstrated to outperform offline RL methods and significantly improve the wall clock time of online RL methods, adapted to sample sequences rather than single transitions. Next, the effect of surrogate gradients as a technique for translating the learning signal from the RL framework to weight updates is analyzed. It is found that the slope, parametrizing the surrogate gradient, plays a crucial role in online RL settings, and can be exploited as an exploration mechanism.