SPARO: Scalable Sparsity-Aware Event-Driven Architecture for Low-Latency Edge Intelligence

None, None

SPARO: Scalable Sparsity-Aware Event-Driven Architecture for Low-Latency Edge Intelligence

Master Thesis (2024)

Author(s)

P. Upadhyay (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Rajendra Bishnoi – Mentor (TU Delft - Computer Engineering)

Kanishkan Vadivel – Mentor (Stichting IMEC Nederland)

Said Hamdioui – Mentor (TU Delft - Computer Engineering)

Faculty

Electrical Engineering, Mathematics and Computer Science

Accelerators Edge-AI Sparsity Exploitation

To reference this document use:

https://resolver.tudelft.nl/uuid:1abd356b-acda-4a27-9b36-75df237ccdb4

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

30-05-2024

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Abstract

Deep Neural Networks (DNNs) have revolutionized numerous computational fields, from image and speech recognition to autonomous driving and natural language processing. Yet, the substantial computational and energy requirements of DNNs, particularly Convolutional Neural Networks (CNNs), pose significant obstacles to their deployment on resource-constrained edge devices. This thesis presents SPARO, a novel Scalable Sparsity-Aware Event-Driven Architecture designed to overcome these challenges by effectively exploiting sparsity in both neural network weights and activations.
SPARO’s architecture is founded upon a unique event-driven dataflow that harnesses the inherent sparsity of CNNs, thereby reducing computational burden and energy consumption. This dataflow is strategically divided into two distinct phases: the Update Phase and the Fire Phase. During the Update Phase, all computations essential for incoming events are executed, while the Fire Phase is dedicated to applying non-linear activation functions and pooling operations to the output feature maps (OFM). This meticulously designed phased approach streamlines data handling, eliminates redundant computations, and significantly boosts overall processing efficiency.
A cornerstone of SPARO’s innovation is its dynamic weight reuse mechanism, which intelligently maximizes the reuse of weights across multiple events. This significantly reduces the number of weight fetches needed, thereby improving arithmetic intensity. Furthermore, SPARO leverages advanced sparse data representation techniques to minimize memory usage and further enhance computational efficiency.
The efficacy of SPARO is demonstrated through comprehensive evaluations using both synthetic benchmarks and real-world CNN applications, such as gesture recognition and object detection. In the same form-factor, SPARO achieves an impressive 8.5x speedup compared to the baseline Seneca system, delivering real-time performance while consuming only 14% of the energy for the TinyYolo vision task.

Files

PANKAJ_5751853_SPARO_THESIS.pd... (pdf)

(pdf | 0 Mb)

License info not available

File under embargo until 30-05-2026