Scalable GPU Acceleration for Complex Brain Simulations

None, None

Scalable GPU Acceleration for Complex Brain Simulations

Master Thesis (2021)

Author(s)

M.C.W. Engelen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Zaid Al-Ars – Mentor (TU Delft - Computer Engineering)

M. Möller – Graduation committee member (TU Delft - Numerical Analysis)

Christos Stydis – Mentor

Mario Negrello – Graduation committee member

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

To reference this document use:

https://resolver.tudelft.nl/uuid:b79bbfa7-0c57-4949-b974-83a7d9ee6b39

More Info

expand_more

Publication Year

2021

Language

English

Copyright

Graduation Date

24-02-2021

Awarding Institution

Delft University of Technology

Programme

['Computer Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Complex mathematical models are used in computational neuroscience to stimulate brain activity to understand the biological processes involved. The simulation of such models is computationally costly, and thus highperformance
computing systems are selected as a potential solution to increase performance.
This thesis aims to implement a new versatile, multi-GPU eHH simulator (mgpuHH), explore its performance and make general observations on performance scalability over different modeling and cluster configuration properties. This work offers a multinode multi-GPU solution that offers excellent
scalability performance due to how the simulator is constructed, with the use of OpenMPI and CUDA. The simulator is configured with JSON configuration files, containing the neural descriptions and simulatorspecific settings. Consequently, enabling a userfriendly environment, for the neuroscientists, without the need of recompiling or understanding the source code. The gap junction calculations are identified as the critical function bottlenecking performance of the simulator. Therefore, an algorithm tailored to utilize GPU performance is implemented to decrease wallclock time for these specific calculations. For internode
communication, OpenMPI can be configured in two ways. Eiter share all possible compartments potentials with every node in the network or only share the compartments potentials to nodes that need them. These methods rely internally on MPI Allgather and Alltoallv respectively. When available, GPUDirect, NVlink, and RDMA are supported. The implementation hides communication overhead, when possible, by concurrently executable compute kernels. A neuron model from the Inferior Olivary Nucleus is selected for benchmarking. Reported results go up to 32 Nodes with a total of 64 GPU cards. The design shows linear weak and strong scaling within the experimental setups for intranode and internode scalability. With this simulator, networks over 10 million cells become available to model on largescale GPU clusters, setting a new standard for eHH simulations. Comparisons against related work on CPU and FPGAs have been conducted, a 100x speedup is achieved versus a single cpu threaded solution. Furthermore, a 2x speedup is achieved over an FPGA solution (flexHH) and 10 fold over a multithreaded CPU (GenEHH, with 128 threads) solution, both reported speedups are for a fully connected network with 7000 IO cells.

Files

MscThesis_mcw_Engelen.pdf

(pdf | 3.43 Mb)

License info not available