Preconditioned Krylov Solvers under Shared-Memory Parallelism

None, None

Preconditioned Krylov Solvers under Shared-Memory Parallelism

Evaluating Convergence, Scalability, and Parallel Overhead

Bachelor Thesis (2025)

Author(s)

H.J.G. Reijersen van Buuren (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Heinlein – Mentor (TU Delft - Numerical Analysis)

D.J.P. Lahaye – Graduation committee member (TU Delft - Mathematical Physics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Conjugate Gradient Parallel Computing Preconditioners Krylov Solvers Shared-Memory Geeneralized Minimal Residual Method Krylov Subspace Methods

To reference this document use:

https://resolver.tudelft.nl/uuid:12973191-eb89-4075-ad78-ce1286ee5aad

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

18-12-2025

Awarding Institution

Delft University of Technology

Programme

['Applied Mathematics']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This thesis investigates how preconditioned Krylov subspace methods perform and scale under shared-memory parallelism. The focus is on the Conjugate Gradient (CG) method for symmetric positive definite systems and the Generalized Minimal Residual (GMRES) method for non-symmetric systems. Both solvers are implemented in PyKokkos and applied to finite element discretisation generated with NGSolve. We look at a scalar Laplace problem, a Stokes-like vector problem, and a steady Stokes flow around a NACA,2412 airfoil.For CG, we study Jacobi and symmetric Gauss–Seidel (SGS) preconditioning, strong and weak scaling up to 16 threads, and kernel-level timings. As the mesh is refined, the iteration count grows in line with the increasing condition number of the stiffness matrix. Jacobi reduces the iteration count only slightly but is cheap and fully parallel, leading to runtimes similar to, and sometimes slightly better than, unpreconditioned CG. SGS roughly halves the iteration count, but its forward/backward sweeps are largely sequential, which limits speed-up on many cores and can make SGS slower overall despite faster convergence.For GMRES we analyse the influence of the restart parameter, preconditioning, and polynomial order. Higher-order vector elements lead to more off-diagonal entries in the system matrix, here scalar Jacobi becomes too weak and can even make restarted GMRES slower than using no preconditioner. SGS remains effective in terms of iterations, but this comes with the same parallel limitations as in CG. And overall GMRES shows poor strong and weak scaling on the tested CPUs. For the NACA Stokes system, Jacobi and SGS preconditioning fails, whereas a block preconditioner that respects the velocity–pressure structure shows rapid convergence.Overall, the results show that good performance on shared-memory architectures requires preconditioners that both respect the block structure of the PDE and are highly parallelizable.
Github: https://github.com/Hugoreijersen/Krylov-Subspace-Methods.git

Files

Thesis_final_2_.pdf

(pdf | 1.97 Mb)

License info not available