A Survey on Accelerating Sparse CNN Inference on GPUs

More Info
expand_more

Abstract

Convolutional neural networks (CNNs) are often pruned to achieve faster training and inference speed while also requiring less memory. Nevertheless, during computation, most modern GPUs cannot take advantage of the sparsity automatically, especially on networks with unstructured sparsity. Therefore, many libraries that exploit sparsity, have been proposed for accelerating CNN inference on GPUs. However, there is little research on systematically comparing them. In this paper, some state-of-the-art libraries for accelerating sparse CNN inference on GPUs are reviewed and benchmarked. Most of the libraries speedup the convolution and/or pooling operations by skipping zero calculations, therefore, they are able to perform sparse matrix calculations faster. However, many of them have hardware and software restrictions and are hard to integrate into a new model to perform end-to-end inference.