J. Thies | TU Delft Repository

Performance of linear solvers in tensor-train format on current multicore architectures

Journal article (2025) - Melven Röhrig-Zöllner (author) , Manuel Becklas (author) , Jonas Thies (author) , Achim Basermann (author)

Tensor networks are a class of algorithms aimed at reducing the computational complexity of high-dimensional problems. They are used in an increasing number of applications, from quantum simulations to machine learning. Exploiting data parallelism in these algorithms is key to us ...

Algebraic temporal blocking for sparse iterative solvers on multi-core CPUs

Journal article (2024) - Christie L. Alappat (author) , J. Thies (author) , Georg Hager (author) , Holger Fehske (author) , G. Wellein (author)

Sparse linear iterative solvers are essential for many large-scale simulations. Much of the runtime of these solvers is often spent in the implicit evaluation of matrix polynomials via a sequence of sparse matrix-vector products. A variety of approaches has been proposed to make ...

SIMD vectorization for simultaneous solution of locally varying linear systems with multiple right-hand sides

Journal article (2023) - Martin J. Kühn (author) , Johannes Holke (author) , Annette Lutz (author) , Jonas Thies (author) , Melven Röhrig-Zöllner (author) , Alexander Bleh (author) , Jan Backhaus (author) , Achim Basermann (author)

Developments in numerical simulation of flows and high-performance computing influence one another. More detailed simulation methods create a permanent need for more computational power, while new hardware developments often require changes to the software to exploit new hardware ...

performance of the low-rank TT-SVD for large dense tensors on modern multicore CPUs

Journal article (2022) - Melven Röhrig-Zöllner (author) , J. Thies (author) , Achim Basermann (author)

There are several factorizations of multidimensional tensors into lower-dimensional components, known as ``tensor networks."" We consider the popular ``tensor-train"" (TT) format and ask, How efficiently can we compute a low-rank approximation from a full tensor on current multic ...

Tensor product scheme for computing bound states of the quantum mechanical three-body problem

Journal article (2022) - Jonas Thies (author) , M.T.R. Hof (author) , Matthias Zimmermann (author) , Maxim Efremov (author)

We develop a computationally and numerically efficient method to calculate binding energies and corresponding wave functions of quantum mechanical three-body problems in low dimensions. Our approach exploits the tensor structure of the multidimensional stationary Schrödinger equa ...

A staggered-grid multilevel incomplete LU for steady incompressible flows

Journal article (2021) - Sven Baars (author) , Mark van Der Klok (author) , Jonas Thies (author) , Fred W. Wubs (author)

Performance engineering for real and complex tall & skinny matrix multiplication kernels on GPUs

Journal article (2021) - Dominik Ernst (author) , Georg Hager (author) , Jonas Thies (author) , Gerhard Wellein (author)

General matrix-matrix multiplications with double-precision real and complex entries (DGEMM and ZGEMM) in vendor-supplied BLAS libraries are best optimized for square matrices but often show bad performance for tall & skinny matrices, which are much taller than wide. NVIDIA’s ...

Towards Scalable Automatic Exploration of Bifurcation Diagrams for Large-Scale Applications

Conference paper (2021) - Jonas Thies (author) , Michiel Wouters (author) , Rebekka Sarah Hennig (author) , Wim Vanroose (author)

The Trilinos library LOCA (http://www.cs.sandia.gov/LOCA/ ) allows computing branches of steady states of large-scale dynamical systems like (discretized) nonlinear PDEs. The core algorithms typically are (pseudo-)arclength continuation, Newton–Krylov methods and (sparse) eigenva ...

A Recursive Algebraic Coloring Technique for Hardware-efficient Symmetric Sparse Matrix-vector Multiplication

Journal article (2020) - Christie Alappat (author) , Achim Basermann (author) , Alan R. Bishop (author) , Holger Fehske (author) , Georg Hager (author) , Olaf Schenk (author) , J. Thies (author) , Gerhard Wellein (author)

The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building block for many numerical linear algebra kernel operations or graph traversal applications. Parallelizing SymmSpMV on today's multicore platforms with up to 100 cores is difficult due to the need ...

PHIST

A Pipelined, Hybrid-Parallel Iterative Solver Toolkit

Journal article (2020) - J. Thies (author) , Melven Röhrig-Zöllner (author) , Nigel Overmars (author) , Achim Basermann (author) , Dominik Ernst (author) , Georg Hager (author) , Gerhard Wellein (author)

The increasing complexity of hardware and software environments in high-performance computing poses big challenges on the development of sustainable and hardware-efficient numerical software. This article addresses these challenges in the context of sparse solvers. Existing solut ...

Essex

Equipping sparse solvers for exascale

Book chapter (2020) - Christie L. Alappat (author) , Andreas Alvermann (author) , Moritz Kreutzer (author) , Bruno Lang (author) , Kengo Nakajima (author) , Melven Röhrig-Zöllner (author) , Tetsuya Sakurai (author) , Faisal Shahzad (author) , Jonas Thies (author) , Gerhard Wellein (author) , Achim Basermann (author) , Holger Fehske (author) , Yasunori Futamura (author) , Martin Galgon (author) , Georg Hager (author) , Sarah Huber (author) , Akira Imakura (author) , Masatoshi Kawai (author)

The ESSEX project has investigated programming concepts, data structures, and numerical algorithms for scalable, efficient, and robust sparse eigenvalue solvers on future heterogeneous exascale systems. Starting without the burden of legacy code, a holistic performance engineerin ...

Performance engineering for a tall & skinny matrix multiplication kernels on GPUs

Conference paper (2020) - Dominik Ernst (author) , Georg Hager (author) , Jonas Thies (author) , Gerhard Wellein (author)

General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized for square matrices but often show bad performance for tall & skinny matrices, which are much taller than wide. Nvidia’s current CUBLAS implementation delivers only a fraction of ...

Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects

Journal article (2019) - Andreas Alvermann (author) , Achim Basermann (author) , Thomas Huckle (author) , Akihiro Ida (author) , Akira Imakura (author) , Masatoshi Kawai (author) , Simone Koecher (author) , Moritz Kreutzer (author) , Pavel Kus (author) , Bruno Lang (author) , Hermann Lederer (author) , Valeriy Manin (author) , Hans-Joachim Bungartz (author) , Andreas Marek (author) , Kengo Nakajima (author) , Lydia Nemec (author) , Karsten Reuter (author) , Michael Rippl (author) , Melven Röhrig-Zöllner (author) , Tetsuya Sakurai (author) , Matthias Scheffler (author) , Christoph Scheurer (author) , Faisal Shahzad (author) , Christian Carbogno (author) , Danilo Simoes Brambila (author) , J. Thies (author) , Gerhard Wellein (author) , Dominik Ernst (author) , Holger Fehske (author) , Yasunori Futamura (author) , Martin Galgon (author) , Georg Hager (author) , Sarah Huber (author)

We first briefly report on the status and recent achievements of the ELPA-AEO (Eigen value Solvers for Petaflop Applications—Algorithmic Extensions and Optimizations) and ESSEX II (Equipping Sparse Solvers for Exascale) projects. In both collaboratory efforts, scientists from the ...

CRAFT

A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance

Journal article (2018) - Faisal Shahzad (author) , J. Thies (author) , Moritz Kreutzer (author) , Thomas Zeiser (author) , Georg Hager (author) , Gerhard Wellein (author)

In order to efficiently use the future generations of supercomputers, fault tolerance and power consumption are two of the prime challenges anticipated by the High Performance Computing (HPC) community. Checkpoint/Restart (CR) has been and still is the most widely used technique ...

Numerical bifurcation analysis of a 3D turing-type reaction–diffusion model

Journal article (2018) - Weiyan Song (author) , Fred Wubs (author) , J. Thies (author) , Sven Baars (author)

We perform a numerical study of a two-component reaction–diffusion model. By using numerical continuation methods, combined with state-of-the-art sparse linear and eigenvalue solvers, we systematically compute steady state solutions and analyze their stability and relations in bo ...

GHOST

Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems

Journal article (2017) - Moritz Kreutzer (author) , J. Thies (author) , Melven Röhrig-Zöllner (author) , Andreas Pieper (author) , Faisal Shahzad (author) , Martin Galgon (author) , Achim Basermann (author) , Holger Fehske (author) , Georg Hager (author) , Gerhard Wellein (author)

While many of the architectural details of future exascale-class high performance computer systems are still a matter of intense research, there appears to be a general consensus that they will be strongly heterogeneous, featuring “standard” as well as “accelerated” resources. To ...

Improved coefficients for polynomial filtering in ESSEX

Conference paper (2017) - Martin Galgon (author) , Lukas Krämer (author) , Achim Basermann (author) , Melven Röhrig-Zöllner (author) , Jonas Thies (author) , Bruno Lang (author) , Andreas Alvermann (author) , Holger Fehske (author) , Andreas Pieper (author) , Georg Hager (author) , Moritz Kreutzer (author) , Faisal Shahzad (author) , Gerhard Wellein (author)

The ESSEX project is an ongoing effort to provide exascale-enabled sparse eigensolvers, especially for quantum physics and related application areas. In this paper we first briefly summarize some key achievements that have been made within this project. Then we focus on a project ...

Performance engineering and energy efficiency of building blocks for large, sparse eigenvalue computations on heterogeneous supercomputers

Conference paper (2016) - Moritz Kreutzer (author) , Jonas Thies (author) , Georg Hager (author) , Bruno Lang (author) , Gerhard Wellein (author) , Andreas Pieper (author) , Andreas Alvermann (author) , Martin Galgon (author) , Melven Röhrig-Zöllner (author) , Faisal Shahzad (author) , Achim Basermann (author) , Alan R. Bishop (author) , Holger Fehske (author)

Numerous challenges have to be mastered as applications in scientific computing are being developed for post-petascale parallel systems. While ample parallelism is usually available in the numerical problems at hand, the efficient use of supercomputer resources requires not only ...

Towards an exascale enabled sparse solver repository

Conference paper (2016) - Jonas Thies (author) , Martin Galgon (author) , Bruno Lang (author) , Gerhard Wellein (author) , Faisal Shahzad (author) , Andreas Alvermann (author) , Moritz Kreutzer (author) , Andreas Pieper (author) , Melven Röhrig-Zöllner (author) , Achim Basermann (author) , Holger Fehske (author) , Georg Hager (author)

As we approach the exascale computing era, disruptive changes in the software landscape are required to tackle the challenges posed by manycore CPUs and accelerators. We discuss the development of a new ‘exascale enabled’ sparse solver repository (the ESSR) that addresses these c ...

On the parallel iterative solution of linear systems arising in the FEAST algorithm for computing inner eigenvalues

Journal article (2015) - Martin Galgon (author) , Lukas Krämer (author) , J. Thies (author) , Achim Basermann (author) , Bruno Lang (author)

Methods for the solution of sparse eigenvalue problems that are based on spectral projectors and contour integration have recently attracted more and more attention. Such methods require the solution of many shifted sparse linear systems of full size. In most of the literature co ...