Rene Miedema | TU Delft Repository

Decoupling model descriptions from execution

A modular paradigm for extensible neurosimulation with EDEN

Journal article (2025) - S. Panagiotou, R. Miedema, D. Soudris, C. Strydis

Computational-neuroscience simulators have traditionally been constrained by tightly coupled simulation engines and modeling languages, limiting their flexibility and scalability. Retrofitting these platforms to accommodate new backends is often costly, and sharing models across simulators remains cumbersome. This paper puts forward an alternative approach based on the EDEN neural simulator, which introduces a modular stack that decouples abstract model descriptions from execution. This architecture enhances flexibility and extensibility by enabling seamless integration of multiple backends, including hardware accelerators, without extensive reprogramming. Through the use of NeuroML, simulation developers can focus on high-performance execution, while model users benefit from improved portability without the need to implement custom simulation engines. Additionally, the proposed method for incorporating arbitrary simulation platforms—from model-optimized code kernels to custom hardware devices—as backends offers a more sustainable and adaptable framework for the computational-neuroscience community. The effectiveness of EDEN's approach is demonstrated by integrating two distinct backends: flexHH, an FPGA-based accelerator for extended Hodgkin-Huxley networks, and SpiNNaker, the well-known, neuromorphic platform for large-scale spiking neural networks. Experimental results show that EDEN integrates the different backends with minimal effort while maintaining competitive performance, reaffirming it as a robust, extensible platform that advances the design paradigm for neural simulators by achieving high generality, performance, and usability. ...

Tricking AI chips into simulating the human brain

A detailed performance analysis

Journal article (2024) - Lennart P.L. Landsmeer, Max C.W. Engelen, Rene Miedema, Christos Strydis

In recent years, significant strides in Artificial Intelligence (AI) have led to various practical applications, primarily centered around training and deployment of deep neural networks (DNNs). These applications, however, require considerable computational resources, predominantly reliant on modern Graphics-Processing Units (GPUs). Yet, the quest for larger and faster DNNs has spurred the creation of specialized AI chips and efficient Machine-Learning (ML) software tools like TensorFlow and PyTorch have been developed for striking a balance between usability and performance. Simultaneously, the field of computational neuroscience shares a similar quest for increased computational power to simulate more extensive and detailed brain models, while also keeping usability high. Although GPUs have also entered this field, programming complexity remains high, resulting in cumbersome simulations. Inspired by AI progress, we introduce a workflow for easily accelerating brain simulations using TensorFlow and evaluate the performance of various, cutting-edge AI chips – including the Graphcore Intelligence-Processing Unit (IPU), GroqChip, Nvidia GPU with Tensor Cores, and Google Tensor-Processing Unit (TPU) – when simulating a biologically detailed as well as simpler brain models. Our model simulations explore the architectural tradeoffs of a modern-day CPU and these four AI platforms by varying computational density, memory requirements and floating-point numerical accuracy. Results show that the GroqChip achieves the best performance for small networks, yet is unable to simulate large-scale networks. At the scale of mammalian brains, the GPU, IPU and TPU achieve speedups ranging from 29x to 1,208x times over CPU runtimes. Remarkably, the TPU sets a new record for the largest, real-time simulation of the inferior-olivary nucleus in the brain. Reduced-accuracy floating-point implementations make some simulation results unreliable for brain research, notably for the GroqChip. Consequently, this work underscores the potential of ML libraries for accelerating brain simulations as well as the critical role of AI-chip numerical accuracy for biophysically realistic brain models. ...

In recent years, significant strides in Artificial Intelligence (AI) have led to various practical applications, primarily centered around training and deployment of deep neural networks (DNNs). These applications, however, require considerable computational resources, predominantly reliant on modern Graphics-Processing Units (GPUs). Yet, the quest for larger and faster DNNs has spurred the creation of specialized AI chips and efficient Machine-Learning (ML) software tools like TensorFlow and PyTorch have been developed for striking a balance between usability and performance. Simultaneously, the field of computational neuroscience shares a similar quest for increased computational power to simulate more extensive and detailed brain models, while also keeping usability high. Although GPUs have also entered this field, programming complexity remains high, resulting in cumbersome simulations. Inspired by AI progress, we introduce a workflow for easily accelerating brain simulations using TensorFlow and evaluate the performance of various, cutting-edge AI chips – including the Graphcore Intelligence-Processing Unit (IPU), GroqChip, Nvidia GPU with Tensor Cores, and Google Tensor-Processing Unit (TPU) – when simulating a biologically detailed as well as simpler brain models. Our model simulations explore the architectural tradeoffs of a modern-day CPU and these four AI platforms by varying computational density, memory requirements and floating-point numerical accuracy. Results show that the GroqChip achieves the best performance for small networks, yet is unable to simulate large-scale networks. At the scale of mammalian brains, the GPU, IPU and TPU achieve speedups ranging from 29x to 1,208x times over CPU runtimes. Remarkably, the TPU sets a new record for the largest, real-time simulation of the inferior-olivary nucleus in the brain. Reduced-accuracy floating-point implementations make some simulation results unreliable for brain research, notably for the GroqChip. Consequently, this work underscores the potential of ML libraries for accelerating brain simulations as well as the critical role of AI-chip numerical accuracy for biophysically realistic brain models.

ExaFlexHH

An exascale-ready, flexible multi-FPGA library for biologically plausible brain simulations

Journal article (2024) - Rene Miedema, Christos Strydis

IntroductionIn-silico simulations are a powerful tool in modern neuroscience for enhancing our understanding of complex brain systems at various physiological levels. To model biologically realistic and detailed systems, an ideal simulation platform must possess: (1) high performance and performance scalability, (2) flexibility, and (3) ease of use for non-technical users. However, most existing platforms and libraries do not meet all three criteria, particularly for complex models such as the Hodgkin-Huxley (HH) model or for complex neuron-connectivity modeling such as gap junctions.MethodsThis work introduces ExaFlexHH, an exascale-ready, flexible library for simulating HH models on multi-FPGA platforms. Utilizing FPGA-based Data-Flow Engines (DFEs) and the dataflow programming paradigm, ExaFlexHH addresses all three requirements. The library is also parameterizable and compliant with NeuroML, a prominent brain-description language in computational neuroscience. We demonstrate the performance scalability of the platform by implementing a highly demanding extended-Hodgkin-Huxley (eHH) model of the Inferior Olive using ExaFlexHH.ResultsModel simulation results show linear scalability for unconnected networks and near-linear scalability for networks with complex synaptic plasticity, with a 1.99 × performance increase using two FPGAs compared to a single FPGA simulation, and 7.96 × when using eight FPGAs in a scalable ring topology. Notably, our results also reveal consistent performance efficiency in GFLOPS per watt, further facilitating exascale-ready computing speeds and pushing the boundaries of future brain-simulation platforms.DiscussionThe ExaFlexHH library shows superior resource efficiency, quantified in FLOPS per hardware resources, benchmarked against other competitive FPGA-based brain simulation implementations. ...

A novel simulator for extended Hodgkin-Huxley neural networks

Conference paper (2020) - Sotirios Panagiotou, Rene Miedema, Harry Sidiropoulos, George Smaragdos, Christos Strydis, Dimitrios Soudris

Computational neuroscience aims to investigate and explain the behaviour and functions of neural structures, through mathematical models. Due to the models' complexity, they can only be explored through computer simulation. Modern research in this field is increasingly adopting large networks of neurons, and diverse, physiologically-detailed neuron models, based on the extended Hodgkin-Huxley (eHH) formalism. However, existing eHH simulators either support highly specific neuron models, or they provide low computational performance, making model exploration costly in time and effort. This work introduces a simulator for extended Hodgkin-Huxley neural networks, on multiprocessing platforms. This simulator supports a broad range of neuron models, while still providing high performance. Simulator performance is evaluated against varying neuron complexity parameters, network size and density, and thread-level parallelism. Results indicate performance is within existing literature for single-model eHH codes, and scales well for large CPU core counts. Ultimately, this application combines model flexibility with high performance, and can serve as a new tool in computational neuroscience. ...

FlexHH

A Flexible Hardware Library for Hodgkin-Huxley-Based Neural Simulations

Journal article (2020) - Rene Miedema, Georgios Smaragdos, Mario Negrello, Zaid Al-Ars, Matthias Moller, Christos Strydis

The Hodgkin-Huxley (HH) neuron is one of the most biophysically-meaningful models used in computational neuroscience today. Ironically, the model's high experimental value is offset by its disproportional computational complexity. To such an extent that neuroscientists have either resorted to simpler models, losing precious neuron detail, or to using high-performance computing systems, to gain acceleration, for complex models. However, multicore/multinode CPU-based systems have proven too slow while FPGA-based ones have proven too time-consuming to (re)deploy to. Clearly, a solution that bridges user friendliness and high speedups is necessary. This paper presents flexHH, a flexible FPGA library implementing five popular, highly parameterizable variants of the HH neuron model. flexHH is the first crucial step towards making FPGA-based simulations of compute-intensive neural models available to neuroscientists without the debilitating penalty of re-engineering and re-synthesis. Through flexHH, the user can instantiate custom models and immediately take advantage of the acceleration without the mediation of an engineer, which has proven to be a major inhibitor to full adoption of FPGAs in neuroscience labs. In terms of performance, flexHH achieves speedups between 8 × - 20 × compared to sequential-C implementations, while only a small drop in real-time capabilities is observed when compared to hardcoded FPGA-based versions of the models tested. ...

Synthesis-Free, Flexible and Fast Hardware Library for Biophysically Plausible Neurosimulations

Conference paper (2020) - Rene Miedema, Georgios Smaragdos, Mario Negrello, Zaid Al-Ars, Matthias Möller, Christos Strydis

Computational neuroscience uses models to study the brain. The Hodgkin-Huxley (HH) model, and its extensions, is one of the most powerful, biophysically meaningful models currently used. The high experimental value of the (extended) Hodgkin-Huxley (eHH) models comes at the cost of steep computational requirements. Consequently, for larger networks, neuroscientists either opt for simpler models, losing neuro-computational features, or use high-performance computing systems. The eHH models can be efficiently implemented as a dataflow application on a FPGA-based architecture. The state-of-the-art FPGA-based implementations have proven to be time-consuming because of the long-duration synthesis requirements. We have developed flexHH, a flexible hardware library, compatible with a widely used neuron-model description format, implementing five FPGA-accelerated and parameterizable variants of eHH models (standard HH with optional extensions: custom ion-gates, gap junctions, and/or multiple cell compartments). Therefore, flexHH is a crucial step towards high-flexibility and high-performance FPGA-based simulations, eschewing the penalty of re-engineering and re-synthesis, dismissing the need for an engineer. In terms of performance, flexHH achieves a speedup of 1,065x against NEURON, the simulator standard in computational neuroscience, and speedups between 8x-20x against sequential C. Furthermore, flexHH is faster per simulation step compared to other HPC technologies, provides 65% or better performance density (in FLOPS/LUT) compared to related works, and only shows a marginal performance drop in real-time simulations. ...

High-Performance Hardware Accelerators for Solving Ordinary Differential Equations

Conference paper (2017) - Ioannis Stamoulias, Matthias Möller, Rene Miedema, Christos Strydis, Christoforos Kachris, Dimitrios Soudris

Ordinary Differential Equations (ODEs) are widely used in many high-performance computing applications. However, contemporary processors generally provide limited throughput for these kinds of calculations. A high-performance hardware accelerator has been developed for speeding-up the solution of ODEs. The hardware accelerator has been developed both for single and double floating-point precision types and a design-space exploration has been performed in terms of performance and hardware resources. The hardware accelerator has been mapped to an FPGA board and connected through PCIe to a typical processor. The performance evaluation shows that the proposed scheme can achieve up to 14x speedup compared to a reference, single-core CPU solution. ...