T.A. Hogervorst

info

Please Note

<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>

Conference paper (2)

Journal article (1)

3 records found

Hardware Acceleration of High-Performance Computational Flow Dynamics Using High-Bandwidth Memory-Enabled Field-Programmable Gate Arrays

Journal article (2022) - Tom Hogervorst, Razvan Nane, Giacomo Marchiori, Tong Dong Qiu, Markus Blatt, Alf Birger Rustad

Scientific computing is at the core of many High-Performance Computing applications, including computational flow dynamics. Because of the utmost importance to simulate increasingly larger computational models, hardware acceleration is receiving increased attention due to its potential to maximize the performance of scientific computing. Field-Programmable Gate Arrays could accelerate scientific computing because of the possibility to fully customize the memory hierarchy important in irregular applications such as iterative linear solvers. In this article, we study the potential of using Field-Programmable Gate Arrays in High-Performance Computing because of the rapid advances in reconfigurable hardware, such as the increase in on-chip memory size, increasing number of logic cells, and the integration of High-Bandwidth Memories on board. To perform this study, we propose a novel Sparse Matrix-Vector multiplication unit and an ILU0 preconditioner tightly integrated with a BiCGStab solver kernel. We integrate the developed preconditioned iterative solver in Flow from the Open Porous Media project, a state-of-the-art open source reservoir simulator. Finally, we perform a thorough evaluation of the FPGA solver kernel in both stand-alone mode and integrated in the reservoir simulator, using the NORNE field, a real-world case reservoir model using a grid with more than 105 cells and using three unknowns per cell. ...

Sparstition

A partitioning scheme for large-scale sparse matrix vector multiplication on FPGA

Conference paper (2019) - Bjorn Sigurbergsson, Tom Hogervorst, Tong Dong Qiu, Răzvan Nane

Sparse Matrix Vector Multiplication (SpMV) is a key kernel in various domains, that is known to be difficult to parallelize efficiently due to the low spatial locality of data. This is problematic for computing large-scale SpMV due to limited cache sizes but also in achieving speedups through parallel execution. To address these issues, we present 1) sparstition, a novel partitioning scheme that enables computing SpMV without the need to do any major post-processing steps, and 2) a corresponding HLS-based hardware design that is able to perform large-scale SpMV efficiently. The design is pipelined so the matrix size is limited only by the size of the off-chip memory (DRAM) and not by the available on-chip memory (BRAMs). Our experimental results, performed on a ZedBoard, show that we achieve a computational throughput of up to 300 MFLOPS in single-precision and 108 MFLOPS in double-precision, an improvement of 2.6X on average compared to current state-of-the-art HLS results. Finally, we predict that sparstition can boost the computational throughput of HLS-based SpMV kernel to over 10 GFLOPS when using High Bandwidth Memories. ...

A Domain-Specific Language and Compiler for Computation-in-Memory Skeletons

Conference paper (2017) - Jintao Yu, Tom Hogervorst, Razvan Nane

Computation-in-Memory (CiM) is a new computer architecture template based on the in-memory computing paradigm. CiM can solve the memory-wall problem of classical Von Neumann-based computer systems by exploiting application-specific computational and data-flow patterns with the capability of performing both storage and computations of emerging resistive RAM technologies (e.g., memristors). However, to efficiently explore and design such radically new application-specific CiM architectures, we require fundamentally new algorithm specification and compilation techniques. In this paper, we introduce a domain-specific language to express not only the computational patterns of an algorithm but also its spatial characteristics. Furthermore, we design a compiler that is able to transform these patterns into highly-optimized CiM designs. Experiments demonstrate the functional correctness of the language and the compiler as well as an order of magnitude speedup improvement over a multicore system in both performance and energy costs. ...