SH

S. Hamdioui

info

Please Note

272 records found

Journal article (2026) - Mohammad Amin Yaldagard, Ankit Bende, Sumit Diware, Vikas Rana, Said Hamdioui, Rajendra Bishnoi
Resistive random-access memory (RRAM)-based computation-in-memory (CIM) architectures offer a promising solution to meet the stringent energy efficiency demands of executing artificial intelligence (AI) algorithms directly on edge devices. However, these architectures suffer from the read-disturb problem, which can lead to accumulated computational errors over time. To maintain the required level of computational accuracy, conventional approaches rely on a static reprogramming process after a predefined number of read cycles, necessitating large counters and resulting in inefficiencies. This paper presents experimental results using real RRAM devices to analyze the read-disturb effect and builds on these insights to propose a circuit-level detection methodology for real-time monitoring of conductance drifts. The proposed method initiates reprogramming only when the device drift exceeds a defined threshold and reprogramming is actually needed. Additionally, an analytical method is developed to determine the minimum conductance state ratio needed to meet reliable detection criteria. Based on this foundation, the proposed detection technique is further optimized for dynamic identification of read-disturb effects. Experiment-augmented SPICE simulation results, using a calibrated model implemented in TSMC 40 nm CMOS technology, validate the functionality and effectiveness of the proposed detection approach. These results demonstrate its potential to improve both the reliability and efficiency of RRAM-based CIM architectures that provide up to a 4x improvement in energy-efficiency compared to traditional periodic reprogramming methods. ...
Conference paper (2026) - Y. Biyani, A. Singh, R. Bishnoi, S. Hamdioui
Analog Compute-in-Memory (CIM), leveraging non-volatile memristive devices to perform in-place computations in the analog domain, holds great potential to efficiently accelerate vector-matrix multiplications (VMM) and realize AI (Artificial Intelligence) at the edge. However, the data converters in such architectures often trade-off accuracy for high energy and area overheads, practically limiting the benefits of CIM. In this work, we present SABCIM, an array-periphery co-design approach for CIM that enables accurate computation as well as digitization of analog VMM outputs with high energy efficiency and competitive area overhead. By leveraging complementary input activations and data storage, each crossbar column generates differential analog output corresponding to the vector-vector multiplication (VVM) result, while inherently addressing underlying non-idealities. This is digitized using a compact, dual-ramp voltage-to-time converter (VTC)-based analog-to-digital converter (ADC). Benchmark results indicate that our work achieves up to $19.6 \times$ higher energy efficiency compared to state-of-the-art (SOTA), while maintaining comparable accuracies. ...
Instruction Set Architecture (ISA) extensions, particularly scalar cryptography extensions (Zk), combine the performance advantages of hardware with the adaptability of software, enabling the direct and efficient execution of cryptographic functions within the processor pipeline. This integration eliminates the need to communicate with external cores, substantially reducing latency, power consumption, and hardware overhead, making it especially suitable for embedded systems with constrained resources. However, current scalar cryptography extension implementations remain vulnerable to physical threats, notably power side-channel attacks (PSCAs). These attacks allow adversaries to extract confidential information, such as secret keys, by analyzing the power consumption patterns of the hardware during operation. This paper presents an optimized and secure implementation of the RISC-V scalar Advanced Encryption Standard (AES) extension (Zkne/Zknd) using Domain-Oriented Masking (DOM) to mitigate first-order PSCAs. Our approach features optimized assembly implementations for partial rounds and key scheduling alongside pipeline-aware microarchitecture optimizations. We evaluated the security and performance of the proposed design using the Xilinx Artix7 FPGA platform. The results indicate that our design is side-channel-resistant while adding a very low area overhead of 0.39% to the full 32-bit CV32E40S RISC-V processor. Moreover, the performance overhead is zero when the extension-related instructions are properly scheduled. ...
Computation-in-Memory (CIM) architectures address the rising demand for energy-efficient artificial intelligence (AI) solutions, by minimizing costly data movements between memory and processor. Within such architectures, SRAM-based digital CIM is especially attractive as it preserves the advantages of CIM while avoiding analog complexity. Recent studies have revealed potential weaknesses in these architectures, particularly to power side-channel attacks (SCA) capable of extracting sensitive model parameters (e.g., neural network (NN) weights), which represent the intellectual property of CIM-based neural network systems. In this study, we propose and evaluate two countermeasures to secure SRAM-based CIM architectures against power attacks: (1) Balanced Obfuscated-path countermeasure, and (2) Glitch Aware countermeasure. To validate their effectiveness, we conducted a comprehensive power analysis that successfully demonstrated attacks against an unprotected implementation. Our experimental results demonstrate that both countermeasures significantly improve resistance to power attacks. Although the Balanced Obfuscated-path offers better area overhead and run-time performance, the Glitch Aware approach achieves higher protection against advanced attacks, making each suitable for different design constraints. ...

Enhancing performance with temporal averaging and SIRENs

Journal article (2026) - Zacharia A. Rudge, Dominik Dold, Moritz Fieback, Dario Izzo, Said Hamdioui
Memristors are an emerging technology that enables artificial intelligence (AI) accelerators with high energy efficiency and radiation robustness — properties that are vital for the deployment of AI on-board spacecraft. However, space applications require reliable and precise computations, while memristive devices suffer from non-idealities, such as device variability, conductance drifts, and device faults. Thus, porting neural networks (NNs) to memristive devices often faces the challenge of severe performance degradation. In this work, we show in simulations that memristor-based NNs achieve competitive performance levels on on-board tasks, such as navigation & control and geodesy of asteroids. Through bit-slicing, temporal averaging of NN layers, and periodic activation functions, we improve initial results from around 0.07 to 0.01 and 0.3 to 0.007 for both tasks using RRAM devices, coming close to state-of-the-art levels (0.003−0.005 and 0.003, respectively). Our results demonstrate the potential of memristors for on-board space applications, and we are convinced that future technology and NN improvements will further close the performance gap to fully unlock the benefits of memristors. ...
Journal article (2026) - Hassen Aziza, Hanzhi Xun, Moritz Fieback, Mottaqiallah Taouil, Said Hamdioui
Vector–matrix multiplication (VMM), implemented through multiply–accumulate (MAC) operations, represents the dominant computational primitive in many artificial intelligence (AI) workloads. When executed on conventional von Neumann architectures, VMM operations suffer from important energy consumption and latency due to the separation between memory and processing units. To overcome these limitations, crossbar arrays built from Resistive Random Access Memory (RRAM) cells have been proposed for accelerating VMM computations. In this work, we investigate the key optimization trade-offs associated with implementing RRAM-based neural networks for classification applications. A simple two-layer neural network is first defined and trained in software to generate the weight matrices and bias parameters. Next, three hardware implementation scenarios are evaluated depending on whether negative floating-point numbers are used: Positive Weights Only (PWO), Positive and Negative Weights Only (PNWO), and Positive and Negative Weights with Biases (PNWB). The different implementations are analyzed at the hardware level by examining classification accuracy, energy efficiency, latency, and area overhead. The study further incorporates important RRAM limitations, including restricted conductance range and device variability. Hardware results show that the PWO scenario offers the lowest energy consumption (189 fJ/MAC) and area overhead but results in the lowest accuracy. PNWO and PNWB significantly improve accuracy (+177% and +180%) but increase energy consumption (+63% and +87%) and area (×2 and ×2.1). Under variability effects, PWO achieves better accuracy (94.65%), followed by PNWO (93.11%) and PNWB (92.11%). ...
Binary Neural Networks (BNNs) have obtained a strong foothold in the field of machine learning at the edge due to their minimal hardware requirements. However, their energy and performance efficiency remain hindered by frequent data transfer between memory and processors. Computation-in-memory (CIM) architectures address this problem by embedding processing units within the memory. Unfortunately, current implementations of CIM are susceptible to IP piracy attacks through side channels. This paper presents a novel secure periphery scheme for NN accelerators with sequential accumulation that conceals IP information by obscuring the power consumption of the counter responsible for the leakage. This is achieved by combining two innovative techniques: operand schedule randomization and an always-count Gray code counter. The results demonstrate that the proposed design effectively resists power side channel attacks (SCAs). Moreover, Signal-to-Noise Ratio (SNR) and Test Vector Leakage Assessment (TVLA) show safe leakage levels. Compared to the state-of-the-art, our countermeasure reduces area and power overheads by up to 12.7× and 13.3×, achieving only 37% area and 51.2% power overhead with the added protection logic. Notably, this enhanced security comes with zero latency overhead, maintaining the performance of the baseline design. ...

Forming-free, multi-bit Pd/HfO2 ReRAM for energy-efficient neuromorphic computing

Memristor technology offers a promising route toward energy-efficient computing but faces challenges including resistance drift, variability, and the need for electroforming. Filamentary resistive random-access memory, one of the most studied memristive platforms, typically requires a high-voltage electroforming step to initiate conductive filaments, leading to increased power overhead and reduced endurance. Here we report HfO2-based forming-free memristive devices (PdNeuRAM) that operate at low voltages, support multi-bit functionality, and exhibit reduced variability. Through combined electrical and materials characterization, we identify a Pd-O-Hf interfacial configuration that lowers oxygen-vacancy formation and migration barriers, creating a dense network of shallow defect states. Together with a Ti top electrode acting as an oxygen reservoir and an ultrathin (5 nm) HfO2 layer, this interfacial engineering enables charge redistribution at room temperature and eliminates the need for electroforming. The fabricated devices provide tunable resistance states and reduce programming and read energy by 43% and 38%, respectively, in spiking neural network inference tasks. These results provide mechanistic insight into forming-free resistive switching and demonstrate the potential of Pd/HfO2 devices for energy-efficient neuromorphic computing. ...

Orienting to SPICE and Circuit Design

Journal article (2026) - Changhao Wang, Sicong Yuan, Nicolo Bellarmino, Danyang Chen, Hanzhi Xun, Lin Wang, Mottaqiallah Taouil, Moritz Fieback, Said Hamdioui, More Authors
Physics-based compact models for emerging non-volatile memories (NVMs) are often limited by the complex interactions of microscopic domains and defects that are difficult to capture analytically, resulting in reduced accuracy and simulation efficiency. To address this challenge, a machine learning (ML)-based approach is proposed using artificial neural networks (ANNs) trained entirely on device measurement data, enabling a direct translation of fabrication characteristics into SPICE-compatible circuit models. The resulting models achieve high accuracy (MSE: 0.724, adjusted R2 : 0.998), significantly outperforming physics-based baselines with an 18× lower MSE for polarization and a two-order-of-magnitude precision improvement in FeFET current simulation, while accurately capturing the wake-up process. Furthermore, the model demonstrates robust out-of-distribution (OOD) extrapolation to unseen ferroelectric thicknesses and a 33.7% improvement in simulation speed. These results validate the ML-based approach as a highly efficient, SPICE-compatible solution for next-generation memory. ...
Journal article (2026) - A. V. Zegbroeck, E. V. Meirvenne, P. Anagnostou, F. Ciubotaru, C. Adelmann, S. Hamdioui, S. Cotofana
Theoretically speaking, Majority logic, originally proposed in the ^{\prime }70s, enables more compact and efficient arithmetic implementations than the conventional Boolean counterpart. Nonetheless, CMOS technology based Majority logic realizations remain challenging, as standard transistor-based approaches are unable to directly exhibit majority behavior. However, recent exploration on beyond CMOS technologies created a resurgence of the interest in majority logic. In this work, we propose and analyze a novel approach towards the 3-input Majority gate (MAJ3) implementation by means of piezoelectric materials. By leveraging their intrinsic electromechanical properties, we convert the digital input signals into mechanical deformations, which are accumulated in a transfer layer. Subsequently, we transform the combined deformation back to the electric domain with a piezoelectronics element properly designed to perform majority functionality. We first present the underlying principles behind our proposal with a short introduction on majority logic, piezoelectronics, and the utilized simulation framework. Afterwards we introduce the proposed piezoelectric 3-input Majority gate (piezo-MAJ3) and strategies for optimizing its behavior and performance. We also detail the material parameters and structural design impact on device performance by utilizing both analytical discussion and physics-based simulations. Finally, we shortly highlight how our proposal can be directly integrated into CMOS circuits and compare the piezo-MAJ3 potential cost and performance with the ones of state of the art implementations. Our results indicate that when compared with its CMOS counterpart, the piezo-MAJ3 gate requires half the area, it is 7x faster, while reducing with 44% the energy consumption. ...
This paper presents the first cryogenic characterization of Hot Carrier Degradation (HCD) in 5-V thick-oxide transistors fabricated in a 160-nm CMOS technology. HCD significantly worsens in nMOS devices at 4.2 K, leading to a more severe degradation, especially of threshold voltage and current in the linear regime. Contrary to expectations, pMOS devices exhibit a temporary performance improvement after stress, showing for the first time at 4.2 K a HCD-induced turn-around effect in threshold voltage and current. The threshold-voltage shift follows a power law with stress time, showing a much higher exponent at $4.2 K$ than at $300 K$ for nMOS, but not for pMOS devices. The threshold-voltage shift also follows a power law with stress voltage, strongly accelerated for nMOS at 4.2 K, but unchanged for pMOS. ...

An overview from bio-inspiration to hardware architectures and learning mechanisms

Journal article (2026) - Anteneh Gebregiorgis, Amirreza Yousefzadeh, Sherif Eissa, Muhammad Ali Siddiqi, Charlotte Frenkel, Friedemann Zenke, Sander Bohte, Abdulqader Nael Mahmoud, Said Hamdioui, More authors...
The endeavor to emulate the extraordinary efficiency and adaptability inherent in the human brain via spike-based neuromorphic computing presents significant potential across a diverse array of applications. The attainment of this objective necessitates the translation of biological principles into artificial systems, a task that continues to pose a complex challenge requiring a profound comprehension of the mechanisms by which neural systems produce robust computational outcomes. This tutorial paper provides a comprehensive overview of the foundational concepts and emerging design trends in spike-based neuromorphic computing, covering advances from materials and circuits to hardware architectures and learning mechanisms. It begins with an examination of key aspects of brain biology and their influence on neuromorphic design, followed by a brief discussion of biologically plausible neuron and synapse models. The paper then defines the core principles and defining attributes of neuromorphic computing, highlighting the trade-offs and design choices underlying current implementations. Building on these foundations, it explores the critical properties of neuromorphic systems, surveys a variety of learning algorithms, and reviews hardware-level realizations of bioinspired neurons and synapses. Subsequent sections discuss state-of-the-art spiking neural network architectures, mapping and compilation strategies, and representative application domains. By providing this end-to-end perspective, the article aims to guide the development of future neuromorphic systems that more closely emulate brain efficiency, scalability, and resilience. ...
Mapping Binary Neural Networks (BNNs) on computation-in-memory (CIM) architectures enables a highly efficient approach for energy-constrained edge computing. In-memory processing significantly reduces critical performance bottlenecks in conventional architectures. Despite their efficiency, current optimized CIM implementations remain vulnerable to IP theft via side-channel analysis. This work investigates the side-channel leakage of a digital BNN-CIM accelerator that employs popcount-based accumulation. A range of circuit-level modifications in counter implementations are proposed and evaluated, exploring their impact on security metrics and design overhead. Results demonstrate that the Hamming weight (HW) and Hamming distance (HD) equalizing techniques combined with power equalization through duplication perform better than traditional dual-rail countermeasures. The findings provide practical guidance for designing secure and efficient peripheral components for popcount-based BNN accelerators. ...
Conference paper (2025) - Hanzhi Xun, Moritz Fieback, Sicong Yuan, Changhao Wang, Erbing Hua, Hassen Aziza, Rajendra Bishnoi, Mottaqiallah Taouil, Said Hamdioui, More Authors...
Addressing non-idealities in Resistive Random Access Memories (RRAMs) is crucial for their successful commercialization. For example, the inherent resistance drift that occurs during consecutive read operations can induce Read Disturb Faults (RDF), leading to functional errors. This paper analyzes and characterizes the resistance drift and the RDF based on data measurements and presents a physics-based RRAM compact model that incorporates these non-idealities. Additionally, an in-field mitigation scheme is proposed, leveraging bidirectional read operations to balance the resistance. The scheme is implemented and validated through circuit simulations, both for RRAM used as memory and for RRAM-based computation-in-memory microarchitectures for deep neural networks. The results demonstrate that RRAM without any mitigation scheme can start failing after 8,000 consecutive reads, while our mitigation scheme ensures that the memory remains functional even after 106 consecutive reads. Furthermore, the results indicate that using the MNIST dataset as a case study, the accuracy can drop significantly from 86% to as low as 12.5% without any mitigation scheme. In contrast, the proposed mitigation scheme improves this accuracy up to 84.2%. ...
Conference paper (2025) - Rajendra Bishnoi, Mohammad Amin Yaldagard, Said Hamdioui, Kanishkan Vadivel, Manolis Sifalakis, Nicolas Daniel Rodriguez, Pedro Julian, Lothar Ratschbacher, Maen Mallah, More Authors...
The goal of the NEUROKIT2E project is to create an open-source Deep Learning framework for edge and embedded AI built around an established European value chain. This framework, called AIDGE, supports a wide range of application areas that operate independently and serve a global user community. It provides easy and fast full-stack solutions from Neural Network design and optimization to AI application development all the way down to hardware implementations while enabling code generation for application-specific targets. This platform provides flexibility for academic users in the AI domain to explore and innovate while allowing them the possibility to prototype systems, ensuring their work aligns well with industrial needs. This paper presents the results and achievements of the first part of this three-year project, along with its roadmap and expected outcomes. ...
Journal article (2025) - Jeroen J.A. Vermeulen, Georgii Krivoshein, Sumit Diware, Muhammad Ali Siddiqi, Arn M.J.M. van den Maagdenberg, Else A. Tolner, Said Hamdioui, Rajendra Bishnoi
Approximately one-third of individuals with chronic epilepsy, a condition resulting from uncontrolled brain activity, do not respond to medication. Animal models are widely used to investigate the mechanism underlying epilepsy, so better drug treatments can be developed for this disease. In such studies, epileptiform activity, assessed by EEG recordings, can be used as a marker for the development of the disease. However, the analysis of EEG recordings is typically done manually, which is time-consuming, subject to observer bias, error-prone, and lacks consistency and efficiency. In this paper, we develop a novel automated methodology for detecting and classifying epileptiform activity, which is tested using the intrahippocampal kainic acid (IHKA) mouse model, a representation of human temporal lobe epilepsy. For that, EEG/LFP recordings are obtained from biological experiments using the IHKA mouse model for data acquisition. We use a spike detection method that combines an improved version of the nonlinear energy operator (NEO) with the automatic NEO thresholding (ANT) algorithm. The proposed method is implemented in Python as an automated and time-efficient algorithm, given its adaptability to different spike and epileptiform event criteria, making it suitable for use in preclinical and potentially future clinical studies. Using our proposed methodology, we achieve a 93.1% accuracy in detecting epileptiform events and a 95.8% accuracy in classification. Moreover, the time for analysis of EEG recordings was reduced by 98.8% compared to manual analysis. Additionally, to demonstrate the potential of the algorithm for brain–machine interfaces (BMI) applications, we develop a hardware architecture and implement it using both an application-specific integrated circuit (ASIC) and a field programmable gate array (FPGA). The FPGA shows the feasibility of near real-time implementation, and for our ASIC implementation, we achieve a post-layout area of 9114 µm2 with a dynamic power consumption of 16.09 μW using TSMC 40 nm technology. ...

Constant Column Current Memristor-Based Computation-in-Memory Micro-Architecture

Advancements in Artificial Intelligence (AI) and Internet-of-Things (IoT) have increased demand for edge AI, but deployment on traditional AI accelerators, like GPUs and TPUs, using von Neumann architecture, suffer from inefficiencies due to separate memory and compute units. Computation-in-Memory (CIM), utilizing non-volatile memristor devices to leverage analog computing principles and perform in-place computations, holds great potential in improving computational efficiency by eliminating frequent data movement. However, standard implementation of CIM faces several challenges, primarily high power consumption and subsequently induced nonlinearity, debating its viability for edge devices. In this paper, we propose C3CIM, a novel memristor-based CIM micro-architecture, featuring a new bit-cell and array design, targeting efficient implementation of Neural Networks (NN). Our architecture uses a constant current source to perform Multiply-and-Accumulate (MAC) operations with a very low computation current (10 to 100 nA), thereby significantly enhancing power efficiency. We adapted C3CIM for Spiking Neural Networks (SNN) and developed a prototype using TSMC 40nm CMOS node for on-silicon validation. Furthermore, our micro-architecture was benchmarked using two SNN models based on N-MNIST and IBM-Gesture datasets, for comparison against current state-of-the-art (SOTA). Results show up to 35x reduction in power along with 6.7x saving in energy compared to SOTA, demonstrating promising potential of this work for edge AI applications. ...
Journal article (2025) - Karan Pathak, Joshua Klein, Giovanni Ansaloni, Said Hamdioui, Georgi Gaydadjiev, Marina Zapater, David Atienza
Full-System (FS) simulation is essential for performance evaluation of complete systems that execute complex applications on a complete software stack consisting of an operating system and user applications. Nevertheless, they require careful fine-tuning against real hardware to obtain reliable performance statistics, which can become tedious, error-prone, and time-consuming with typical trial-and-error approaches. We propose a novel, streamlined, component-level calibration methodology to address these shortcomings to validate FS simulation models. Our methodology greatly accelerates the validation process without sacrificing accuracy. It is Instruction Set Architecture (ISA)-agnostic, and can tackle hardware specifications at different levels of detail. We demonstrate its effectiveness by validating FS models against both open-hardware and IP-protected (closed hardware) RISC-V silicon, achieving a mean error of 19%-23% for the SPEC CPU2017 suite in the two cases. We introduce the first open-source RISC-V-based FS-validated simulation models with a complete and replicable methodology. ...
Journal article (2025) - A.E. El Arrassi, L.C.A. Huijbregts, Manil Dev Gomony, Anteneh Gebregiorgis, Francky Catthoor, M. Taouil, Rajiv V. Joshi, S. Hamdioui
With the rise of energy-constrained smart edge applications, there is a pressing need for energy-efficient computing engines that process generated data locally, at least for small and medium-sized applications. To address this issue, this paper proposes DREAM-CIM, a digital SRAM-based computation-in-memory (CIM) accelerator. It targets an energy- and area-efficient implementation of the multiply-and-accumulate (MAC) operation, which is the core operation of neural networks. The accelerator is based on a multi-sub-array macro to increase parallelism, integrates multiplication operations within the memory cells such that they are executed while reading the cells, makes use of pipelining to further optimize the throughput of the MAC operations, and gets rid of the expensive adder-tree structures commonly used in State-of-The-Art (SOTA) digital CIM solutions by replacing them with a custom accumulation circuit to reduce power and area. The SPICE simulation results of the DREAM-CIM accelerator show an energy efficiency of 5097 TOPS/W (normalized to a 1-bit × 1-bit MAC operation) and an area efficiency of 3854 TOPS/mm$^2$ using 22 nm technology node.
The obtained circuit-level results were fed into a python-based system-level simulator to benchmark the system architecture using two applications, i.e., image classification (using MNIST and CIFAR-10 dataset on LeNet5 and Resnet-20 models) and object detection (using COCO dataset on the YoloV6 model). The system-level results show that DREAM-CIM can achieve an energy efficiency of 0.1mJ, 0.2mJ, and 11.02mJ per inference for the MNIST, YOLOv6, and CIFAR-10 datasets, respectively, while maintaining SOTA accuracy. ...
Introduction: In 2012, potassium and sodium ion channels in Hodgkin-Huxley-based brain models were shown to exhibit memristive behavior. This positioned memristors as strong candidates for implementing biologically accurate artificial neurons. Memristor-based brain simulations offer advantages in energy efficiency, scalability, and compactness, benefiting fields such as soft robotics, embedded systems, and neuroprosthetics. Methods: Previous approaches used current-controlled Mott memristors, which poorly matched the voltage-controlled nature of ion channels. This study employs volatile, oxide-based memristors that leverage electric-field-driven oxygen-vacancy migration to emulate voltage-dependent channel behavior. We selected candidate WOx and NbOx memristors and modeled their dynamics to verify performance as Hodgkin-Huxley potassium channels. Results: The device exhibits sigmoidal gating and voltage-dependent time constants consistent with the theoretical model. By scaling the passive circuitry around the memristors, we show that they capture the essential mechanisms of potassium ion-channels, although spike height is reduced due to strong non-linear voltage-dependence. Still, by cascading multiple compartments, typical spike propagation is retained. Discussion: This is the first demonstration of a voltage-controlled memristor replicating the Hodgkin-Huxley potassium channel, validating its potential for more efficient brain simulation hardware. ...