Exploiting Symmetric Temporally Sparse BPTT for Efficient RNN Training

None, None; None, None; None, None; None, None; None, None; None, None; None, None

Exploiting Symmetric Temporally Sparse BPTT for Efficient RNN Training

Journal Article (2024)

Author(s)

Xi Chen (Sensors Group, Universitat Zurich)

Chang Gao (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Zuowen Wang (Universitat Zurich)

Longbiao Cheng (Universitat Zurich)

Sheng Zhou (Universitat Zurich)

Shih Chii Liu (Universitat Zurich)

Tobi Delbruck (Universitat Zurich)

Research Group

Electronics

DOI related publication

https://doi.org/10.1609/aaai.v38i10.29020 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:320828a8-46d6-4e4f-a938-139bb7eb9cca

More Info

expand_more

Publication Year

2024

Language

English

Research Group

Electronics

Issue number

10

Volume number

38

Pages (from-to)

11399-11406

Event

38th AAAI Conference on Artificial Intelligence, AAAI 2024 (2024-02-20 - 2024-02-27), Vancouver, Canada

Downloads counter

332

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Recurrent Neural Networks (RNNs) are useful in temporal sequence tasks. However, training RNNs involves dense matrix multiplications which require hardware that can support a large number of arithmetic operations and memory accesses. Implementing online training of RNNs on the edge calls for optimized algorithms for an efficient deployment on hardware. Inspired by the spiking neuron model, the Delta RNN exploits temporal sparsity during inference by skipping over the update of hidden states from those inactivated neurons whose change of activation across two timesteps is below a defined threshold. This work describes a training algorithm for Delta RNNs that exploits temporal sparsity in the backward propagation phase to reduce computational requirements for training on the edge. Due to the symmetric computation graphs of forward and backward propagation during training, the gradient computation of inactivated neurons can be skipped. Results show a reduction of ∼80% in matrix operations for training a 56k parameter Delta LSTM on the Fluent Speech Commands dataset with negligible accuracy loss. Logic simulations of a hardware accelerator designed for the training algorithm show 2-10X speedup in matrix computations for an activation sparsity range of 50%-90%. Additionally, we show that the proposed Delta RNN training will be useful for online incremental learning on edge devices with limited computing resources.

Files

29020-Article_Text-33074-1-2-2... (pdf)

(pdf | 0.511 Mb)

- Embargo expired in 30-09-2024

License info not available