Efficient Iterative Solution of Large Linear Systems on Heterogeneous Computing Systems

More Info
expand_more

Abstract

This dissertation deals mainly with the design, implementation, and analysis of efficient iterative solution methods for large sparse linear systems on distributed and heterogeneous computing systems as found in Grid computing. First, a case study is performed on iteratively solving large symmetric linear systems on both a multi–cluster and a local heterogeneous cluster using standard block Jacobi precondi- tioning within the software constraints of standard Grid middleware and within the algorith- mic constraints of preconditioned Conjugate Gradient–type methods. This shows that global synchronisation is a key bottleneck operation and in order to alleviate this bottleneck, three main strategies are proposed: exploiting the hierarchical structure of multi–clusters, using asyn- chronous iterative methods as preconditioners, and minimising the number of inner products in Krylov subspace methods. Asynchronous iterative methods have never really been successfully applied to the solution of extremely large sparse linear systems. The main reason is that the slow convergence rates limit the applicability of these methods. Nevertheless, the lack of global synchronisation points in these methods is a highly favourable property in heterogeneous computing systems. Krylov subspace methods offer significantly improved convergence rates, but the global synchronisation points induced by the inner product operations in each iteration step limits the applicability. By using an asynchronous iterative method as a preconditioner in a flexible Krylov subspace method, the best of both worlds is combined. It is shown that this hybrid combination of a slow but asynchronous inner iteration and a fast but synchronous outer iteration results in high convergence rates on heterogeneous networks of computers. Since the preconditionering iteration is performed on heterogeneous computing hardware, it varies in each iteration step. Therefore, a flexible iterative method which can handle a varying preconditioner has to be employed. This partially asynchronous algorithm is implemented on two different types of Grid hardware applied to two different applications using two different types of Grid middleware. The IDR(s) method and its variants are new and powerful algorithms for iteratively solving large nonsymmetric linear systems. Four techniques are used to construct an efficient IDR(s) variant for parallel computing and in particular for Grid computing. Firstly, an efficient and robust IDR(s) variant is presented that has a single global synchronisation point per matrix– vector multiplication step. Secondly, the so–called IDR test matrix in IDR(s) can be chosen freely and this matrix is constructed such that the work, communication, and storage involving this matrix are minimised in the context of multi–clusters. Thirdly, a methodology is presented for a priori estimation of the optimal value of s in IDR(s). Finally, the proposed IDR(s) variant is combined with an asynchronous preconditioning iteration. By using an asynchronous preconditioner in IDR(s), the IDR(s) method is treated as a flexible method, where the preconditioner changes in each iteration step. In order to begin analysing mathematically the effect of a varying preconditioning operator on the convergence properties of IDR(s), the IDR(s) method is interpreted as a special type of deflation method. This leads to a better understanding of the core structure of IDR(s) methods. In particular, it provides an intuitive explanation for the excellent convergence properties of IDR(s). Two applications from computational fluid dynamics are considered: large bubbly flow prob- lems and large (more general) convection–diffussion problems, both in 2D and 3D. However, the techniques presented can be applied to a wide range of scientific applications. Large numerical experiments are performed on two heterogeneous computing platforms: (i) local networks of non–dedicated computers and (ii) a dedicated cluster of clusters linked by a high–speed network. The numerical experiments not only demonstrate the effectiveness of the techniques, but they also illustrate the theoretical results.