A Survey on Accelerating Sparse CNN Inference on GPUs

Bachelor thesis (2022)

Authors

Q. Chen Electrical Engineering, Mathematics and Computer Science

Contributors

Hasan Mohamed University of Zürich (supervisor 1)

Shih Chii Liu University of Zürich (supervisor 1)

N. Tömen Pattern Recognition and Bioinformatics - (supervisor 1)

Marco Zuniga Embedded Systems - (supervisor 2)

Faculty

Electrical Engineering, Mathematics and Computer Science

Convolutional Neural Networks (CNNs) Sparsity Accelerators Inference

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:615a9965-3685-439e-8599-9c913b9902da

Published Date

24-06-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Convolutional neural networks (CNNs) are often pruned to achieve faster training and inference speed while also requiring less memory. Nevertheless, during computation, most modern GPUs cannot take advantage of the sparsity automatically, especially on networks with unstructured sparsity. Therefore, many libraries that exploit sparsity, have been proposed for accelerating CNN inference on GPUs. However, there is little research on systematically comparing them. In this paper, some state-of-the-art libraries for accelerating sparse CNN inference on GPUs are reviewed and benchmarked. Most of the libraries speedup the convolution and/or pooling operations by skipping zero calculations, therefore, they are able to perform sparse matrix calculations faster. However, many of them have hardware and software restrictions and are hard to integrate into a new model to perform end-to-end inference.

Files

A_Survey_on_Accelerating_Spars... (.pdf)

(.pdf | 2.33 Mb)