Performance engineering and energy efficiency of building blocks for large, sparse eigenvalue computations on heterogeneous supercomputers

Conference Paper (2016)
Author(s)

Moritz Kreutzer (Friedrich-Alexander-Universität Erlangen-Nürnberg)

Jonas Thies (Deutsches Zentrum für Luft- und Raumfahrt (DLR))

Andreas Pieper (Greifswald University)

Andreas Alvermann (Greifswald University)

Martin Galgon (Bergische Universität Wuppertal )

Melven Röhrig-Zöllner (Deutsches Zentrum für Luft- und Raumfahrt (DLR))

Faisal Shahzad (Friedrich-Alexander-Universität Erlangen-Nürnberg)

Achim Basermann (Deutsches Zentrum für Luft- und Raumfahrt (DLR))

Alan R. Bishop (Los Alamos National Laboratory)

Holger Fehske (Greifswald University)

Georg Hager (Friedrich-Alexander-Universität Erlangen-Nürnberg)

Bruno Lang (Bergische Universität Wuppertal )

Gerhard Wellein (Friedrich-Alexander-Universität Erlangen-Nürnberg)

Affiliation
External organisation
DOI related publication
https://doi.org/10.1007/978-3-319-40528-5_14 Final published version
More Info
expand_more
Publication Year
2016
Language
English
Affiliation
External organisation
Pages (from-to)
317-338
Publisher
Springer
ISBN (print)
9783319405261
Event
International Conference on Software for Exascale Computing, SPPEXA 2015 (2016-01-25 - 2016-01-27), Munich, Germany
Downloads counter
215

Abstract

Numerous challenges have to be mastered as applications in scientific computing are being developed for post-petascale parallel systems. While ample parallelism is usually available in the numerical problems at hand, the efficient use of supercomputer resources requires not only good scalability but also a verifiably effective use of resources on the core, the processor, and the accelerator level. Furthermore, power dissipation and energy consumption are becoming further optimization targets besides time-to-solution. Performance Engineering (PE) is the pivotal strategy for developing effective parallel code on all levels of modern architectures. In this paper we report on the development and use of low-level parallel building blocks in the GHOST library (“General, Hybrid, and Optimized Sparse Toolkit”). We demonstrate the use of PE in optimizing a density of states computation using the Kernel Polynomial Method, and show that reduction of runtime and reduction of energy are literally the same goal in this case. We also give a brief overview of the capabilities of GHOST and the applications in which it is being used successfully.