Suitability of Shallow Water Solving Methods for GPU Acceleration

More Info
expand_more

Abstract

In the past 15 years the field of general purpose computing on graphics processing units, or GPUs, has become well developed and the practice is becoming more mainstream. Computational demands of simulation software are continuously increasing. As such for many applications traditionally computed on the central processing unit the question arises of whether moving to GPU computing is a possible cost effective way of meeting these demands.
The fundamental nature of GPU architecture that makes it so cost effective at doing bulk computation also poses restrictions on which applications are suitable for it. The shallow water equations are a simplified form of the Navier-Stokes equations and describe water levels and flow currents in suitably shallow water such as rivers, estuaries and the North sea. The main research goal of this thesis project was to determine whether the shallow water equations are suitable for implementation on a GPU. Two options exist, the equations may be solved with either an explicit or implicit time integration method. First, a literature study was conducted to familiarize with the tools required to build explicit and implicit shallow water models on a GPU. Then both an explicit and implicit shallow water solver were developed first in the MATLAB programming language and later in CUDA C++ on both CPU and GPU. The main findings are that both explicit and implicit methods are well suited for GPU implementation. Both methods proved to be compatible with a wetting and drying mechanism of numerical cells. The Cuda C++ implementation was in the order of 10 times as fast as a MATLAB implementation for both CPU and GPU. For the benchmark cases tested, the Cuda C++ GPU implementation was in the order of 50 times faster than the equivalent multithreaded CPU implementation.
The implicit implementation was benchmarked using the conjugate gradient method to solve the linear system. Various preconditioners were tested and a Repeated Red Black preconditioner was found to be the most effective. The computation time of the RRB preconditioned implicit method was compared with the explicit method and it was found that the two methods reached parity in computation time when the implicit time step was taken roughly 50 times as large as the explicit time step. For implicit time steps smaller than that the explicit method was faster and when the implicit time step was larger the implicit method was faster. For the benchmark cases tested, the implicit method using a time step 50 times larger than the explicit method was found to be less accurate and less stable than the explicit method. The conclusion is that for cases similar to the benchmark cases an explicit method is the fastest, most stable and most accurate method and thus the preferred choice.