Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform

Journal article (2011)

Authors

S. Xu

W. Xue

H.X. Lin

Department

Delft Institute of Applied Mathematics () (TU Delft)

DOI: https://doi.org/doi:10.1007/s11227-011-0626-0

CUDA Matrix GPU Cache optimization Permutation Sparse matrices-vector multiplication

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:f45fb838-3453-4eb4-82ab-0394ecc21e3e

Published Date

07-06-2011

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Source:

The Journal of Supercomputing, 63 (3), 2013

ISSN:

1573-0484

Source:

http://link.springer.com/journal/11227

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Delft Institute of Applied Mathematics

Abstract

In this article, we discuss the performance modeling and optimization of Sparse Matrix-Vector Multiplication (SpMV) on NVIDIA GPUs using CUDA. SpMV has a very low computation-data ratio and its performance is mainly bound by the memory bandwidth. We propose optimization of SpMV based on ELLPACK from two aspects: (1) enhanced performance for the dense vector by reducing cache misses, and (2) reduce accessed matrix data by index reduction. With matrix bandwidth reduction techniques, both cache usage enhancement and index compression can be enabled. For GPU with better cache support, we propose differentiated memory access scheme to avoid contamination of caches by matrix data. Performance evaluation shows that the combined speedups of proposed optimizations for GT-200 are 16% (single-precision) and 12.6% (double-precision) for GT-200 GPU, and 19% (single-precision) and 15% (double-precision) for GF-100 GPU.

Files

Xu_performance_modeling.pdf

(pdf | 0.486 Mb)