Sheng Zhou
Please Note
3 records found
1
Artificial intelligence (AI) has made significant strides towards efficient online processing of sensory signals at the edge through the use of deep neural networks with ever-expanding size. However, this trend has brought with it escalating computational costs and energy consumption, which have become major obstacles to the deployment and further upscaling of these models. In this Perspective, we present a neuro-inspired vision to boost the energy efficiency of AI for perception by leveraging brain-like dynamic sparsity. We categorize various forms of dynamic sparsity rooted in data redundancy and discuss potential strategies to enhance and exploit it through algorithm-hardware co-design. Additionally, we explore the technological, architectural, and algorithmic challenges that need to be addressed to fully unlock the potential of dynamic-sparsity-aware neuro-inspired AI for energy-efficient perception.
We present a sub-10-µW fully integrated SoC for on-device spoken language understanding (SLU). Its analog feature extractor (FEx) applies global and per-channel automatic gain control (AGC) to extend the system’s dynamic range (DR)—a critical requirement for real-world scenarios, including far-field operations. The on-chip streaming-mode recurrent neural network (RNN) accelerator exploits temporal sparsity and pooling, reducing its power by 2.3x. By combining hardware-aware training with a behavioral model of the FEx that captures circuit nonidealities, the network is trained to maintain SLU accuracy despite chip-to-chip variation. Fabricated in a 65-nm CMOS process, the SoC occupies 2.23 mm 2 and consumes 8.62 µW for end-to-end SLU. The 16-channel FEx achieves 93-dB DR while dissipating 1.85 µW at 100-Hz feature frame rate. The SoC is evaluated on the 32-class Fluent Speech Commands dataset (FSCD), achieving 92.9% accuracy for 2.8-mV rms inputs while maintaining >85% accuracy over a 75-dB input range.
Recurrent Neural Networks (RNNs) are useful in temporal sequence tasks. However, training RNNs involves dense matrix multiplications which require hardware that can support a large number of arithmetic operations and memory accesses. Implementing online training of RNNs on the edge calls for optimized algorithms for an efficient deployment on hardware. Inspired by the spiking neuron model, the Delta RNN exploits temporal sparsity during inference by skipping over the update of hidden states from those inactivated neurons whose change of activation across two timesteps is below a defined threshold. This work describes a training algorithm for Delta RNNs that exploits temporal sparsity in the backward propagation phase to reduce computational requirements for training on the edge. Due to the symmetric computation graphs of forward and backward propagation during training, the gradient computation of inactivated neurons can be skipped. Results show a reduction of ∼80% in matrix operations for training a 56k parameter Delta LSTM on the Fluent Speech Commands dataset with negligible accuracy loss. Logic simulations of a hardware accelerator designed for the training algorithm show 2-10X speedup in matrix computations for an activation sparsity range of 50%-90%. Additionally, we show that the proposed Delta RNN training will be useful for online incremental learning on edge devices with limited computing resources.