Fine-Grain Parallel RRB-Solver for 5-/9-Point Stencil Problems Suitable for GPU-Type Processors

Journal Article (2017)
Author(s)

M.A. de Jong (TU Delft - Numerical Analysis)

Auke van der Ploeg (Maritime Research Institute Netherlands (MARIN))

Auke Ditzel (Maritime Research Institute Netherlands (MARIN))

C Vuik (TU Delft - Numerical Analysis)

Research Group
Numerical Analysis
More Info
expand_more
Publication Year
2017
Language
English
Research Group
Numerical Analysis
Volume number
46
Pages (from-to)
375-393

Abstract

Preconditioners based on incomplete factorization are very popular for a fast convergence of the PCG-algorithm. However, these preconditioners are hard to parallelize since most operations are inherently sequential. In this paper we present the RRB-solver, which is a PCG-type solver using an incomplete Cholesky factorization based on the Repeated Red-Black (RRB) method. The RRB-solver scales nearly as well as Multigrid, and in this paper we show that this method can be parallelized very efficiently on modern computing architectures including GPUs. For an efficient parallel implementation a clever storage scheme turns out to be the key. The storage scheme is called r1=r2=b1=b2 and it ensures that memory transfers are coalesced throughout the algorithm, yielding near-optimal performance of the RRB-solver. The r1=r2=b1=b2-storage scheme in combination with a CUDA implementation on the GPU gives speedup factors of more than 30 compared to a sequential implementation on one CPU core for 5-/9-point stencils problems. A comparison with algebraic Multigrid further shows that the RRB-solver can be implemented very efficiently on the GPU.

No files available

Metadata only record. There are no files for this record.