SPARO: Scalable Sparsity-Aware Event-Driven Architecture for Low-Latency Edge Intelligence

More Info


Deep Neural Networks (DNNs) have revolutionized numerous computational fields, from image and speech recognition to autonomous driving and natural language processing. Yet, the substantial computational and energy requirements of DNNs, particularly Convolutional Neural Networks (CNNs), pose significant obstacles to their deployment on resource-constrained edge devices. This thesis presents SPARO, a novel Scalable Sparsity-Aware Event-Driven Architecture designed to overcome these challenges by effectively exploiting sparsity in both neural network weights and activations.
SPARO’s architecture is founded upon a unique event-driven dataflow that harnesses the inherent sparsity of CNNs, thereby reducing computational burden and energy consumption. This dataflow is strategically divided into two distinct phases: the Update Phase and the Fire Phase. During the Update Phase, all computations essential for incoming events are executed, while the Fire Phase is dedicated to applying non-linear activation functions and pooling operations to the output feature maps (OFM). This meticulously designed phased approach streamlines data handling, eliminates redundant computations, and significantly boosts overall processing efficiency.
A cornerstone of SPARO’s innovation is its dynamic weight reuse mechanism, which intelligently maximizes the reuse of weights across multiple events. This significantly reduces the number of weight fetches needed, thereby improving arithmetic intensity. Furthermore, SPARO leverages advanced sparse data representation techniques to minimize memory usage and further enhance computational efficiency.
The efficacy of SPARO is demonstrated through comprehensive evaluations using both synthetic benchmarks and real-world CNN applications, such as gesture recognition and object detection. In the same form-factor, SPARO achieves an impressive 8.5x speedup compared to the baseline Seneca system, delivering real-time performance while consuming only 14% of the energy for the TinyYolo vision task.