SD

S.S. Diware

info

Please Note

21 records found

Journal article (2026) - Mohammad Amin Yaldagard, Ankit Bende, Sumit Diware, Vikas Rana, Said Hamdioui, Rajendra Bishnoi
Resistive random-access memory (RRAM)-based computation-in-memory (CIM) architectures offer a promising solution to meet the stringent energy efficiency demands of executing artificial intelligence (AI) algorithms directly on edge devices. However, these architectures suffer from the read-disturb problem, which can lead to accumulated computational errors over time. To maintain the required level of computational accuracy, conventional approaches rely on a static reprogramming process after a predefined number of read cycles, necessitating large counters and resulting in inefficiencies. This paper presents experimental results using real RRAM devices to analyze the read-disturb effect and builds on these insights to propose a circuit-level detection methodology for real-time monitoring of conductance drifts. The proposed method initiates reprogramming only when the device drift exceeds a defined threshold and reprogramming is actually needed. Additionally, an analytical method is developed to determine the minimum conductance state ratio needed to meet reliable detection criteria. Based on this foundation, the proposed detection technique is further optimized for dynamic identification of read-disturb effects. Experiment-augmented SPICE simulation results, using a calibrated model implemented in TSMC 40 nm CMOS technology, validate the functionality and effectiveness of the proposed detection approach. These results demonstrate its potential to improve both the reliability and efficiency of RRAM-based CIM architectures that provide up to a 4x improvement in energy-efficiency compared to traditional periodic reprogramming methods. ...
Timely identification of cardiac arrhythmia (abnormal heartbeats) is vital for early diagnosis of cardiovascular diseases. Wearable healthcare devices facilitate this process by recording heartbeats through electrocardiogram (ECG) signals and using AI-driven hardware to classify them into arrhythmia classes. Spiking neural networks (SNNs) are well-suited for such hardware as they consume low energy due to event-driven operation. However, their energy-efficiency and accuracy are constrained by encoding methods that translate real-valued ECG data into spikes. In this paper, we present an SNN-based ECG classification architecture featuring a new adaptive multi-threshold spike encoding scheme. This scheme adjusts encoding window and granularity based on the importance of ECG data samples, to capture essential information with fewer spikes. We develop a high-accuracy SNN model for such spike representation, by proposing a technique specifically tailored to our encoding. We design a hardware architecture for this model, which incorporates optimized layer post-processing for energy-efficient data-flow and employs fixed-point quantization for computational efficiency. Moreover, we integrate this architecture with our encoding scheme into a system-on-chip implementation using TSMC 40 nm technology. Our approach provides up to 5.1x energy-efficiency compared to state-of-the-art SNN-based ECG classifiers, with high accuracy. ...
Memristor-based Computation-In-Memory (CIM) has emerged as a compelling paradigm for designing energy-efficient neural network hardware. However, memristors suffer from conductance variation issue, which introduces computational errors in CIM hardware and leads to a degraded inference accuracy. In this paper, we present a hardware-aware quantization to mitigate the impact of conductance variation on CIM-based neural networks. We achieve this using the inherent characteristics of fixed-point arithmetic in CIM hardware. By tuning the bit-precision of weights, we align the conductance variation-induced errors with lower-order output bits. This reduces their numerical impact on the fixed-point output. We further decrease the residual errors by selectively discarding bits with low information and high error. This leads to error-free computations and a high inference accuracy. Our proposed methodology achieves 5.6× correct operations per unit energy compared to the conventional approach, while incurring very low hardware overheads. ...
Conference paper (2025) - A. Sehgal, A. Kumar Shukla, S. Diware, S. Soni, S. Dhull, S. Shreya, S. Roy, R.K. Bishnoi
Computational-In-Memory (CIM) architectures have emerged as energy-efficient solutions for Artificial Intelligence (AI) applications, enabling data processing within memory arrays and reducing the bottleneck associated with data transfer. The rapid advancement of AI demands real-time on-chip learning but implementing this with CIM architectures poses significant challenges, such as limited parallelism and energy-efficiency during training and inference. In this paper, we propose a novel CIM architecture specifically designed for on-chip learning applications, which capitalizes on the unique properties of Spin-Orbit Torque (SOT) technology to enhance both parallelism and energy-efficiency in computation. The proposed architecture incorporates a bulk-write mechanism for SOT-cell based arrays, enabling efficient weight updates during on-chip training. Additionally, we develop a scheme to process vector elements concurrently for vector-matrix multiplications during inference. To achieve this, we design multi-port bit-cell access capabilities along with their associated control mechanisms. Simulation results show a $5.82 \times$ reduction in latency and a $3.20 \times$ improvement in energy-efficiency compared to standard SOT-MRAM based CIM, with negligible overhead. ...
Conference paper (2025) - A. Sehgal, S. Soni, S. Diware, A. K. Shukla, S. Roy, R. Bishnoi
Computational-In-Memory (CIM) is an energy-efficient paradigm that integrates computation directly within memory arrays, reducing the bottleneck associated with data transfer. This approach is beneficial for Artificial Intelligence (AI) applications that require on-chip learning for real-time processing. However, implementing on-chip learning in CIM architectures remains challenging due to limited throughput and energy-efficiency during both online training and inference. In conventional architectures, weight updates necessitate the inference process to halt to avoid unintended computation outcomes. To overcome this limitation, this paper presents a novel Spin-Orbit Torque (SOT)-based CIM architecture tailored for continuous on-chip learning applications, which enable weight updates without interrupting the inference. The proposed SOT bit-cell utilizes two read ports and one write port (2R1W) configuration, where one read port (1R) is dedicated to inference and one read and one write (1R1W) for on-chip learning that enables concurrent read and write operations. Our proposed architecture is evaluated at the system-level using the Generic-PDK 45 nm technology node, demonstrating 2.4× improvement in energy-efficiency and 5.4× improvement in throughput compared to state-of-the-art solutions, with minimal overhead. ...
Journal article (2025) - Jeroen J.A. Vermeulen, Georgii Krivoshein, Sumit Diware, Muhammad Ali Siddiqi, Arn M.J.M. van den Maagdenberg, Else A. Tolner, Said Hamdioui, Rajendra Bishnoi
Approximately one-third of individuals with chronic epilepsy, a condition resulting from uncontrolled brain activity, do not respond to medication. Animal models are widely used to investigate the mechanism underlying epilepsy, so better drug treatments can be developed for this disease. In such studies, epileptiform activity, assessed by EEG recordings, can be used as a marker for the development of the disease. However, the analysis of EEG recordings is typically done manually, which is time-consuming, subject to observer bias, error-prone, and lacks consistency and efficiency. In this paper, we develop a novel automated methodology for detecting and classifying epileptiform activity, which is tested using the intrahippocampal kainic acid (IHKA) mouse model, a representation of human temporal lobe epilepsy. For that, EEG/LFP recordings are obtained from biological experiments using the IHKA mouse model for data acquisition. We use a spike detection method that combines an improved version of the nonlinear energy operator (NEO) with the automatic NEO thresholding (ANT) algorithm. The proposed method is implemented in Python as an automated and time-efficient algorithm, given its adaptability to different spike and epileptiform event criteria, making it suitable for use in preclinical and potentially future clinical studies. Using our proposed methodology, we achieve a 93.1% accuracy in detecting epileptiform events and a 95.8% accuracy in classification. Moreover, the time for analysis of EEG recordings was reduced by 98.8% compared to manual analysis. Additionally, to demonstrate the potential of the algorithm for brain–machine interfaces (BMI) applications, we develop a hardware architecture and implement it using both an application-specific integrated circuit (ASIC) and a field programmable gate array (FPGA). The FPGA shows the feasibility of near real-time implementation, and for our ASIC implementation, we achieve a post-layout area of 9114 µm2 with a dynamic power consumption of 16.09 μW using TSMC 40 nm technology. ...
Doctoral thesis (2024) - S.S. Diware, S. Hamdioui, R.K. Bishnoi
Artificial intelligence (AI) is rapidly becoming an integral part of many real-world products and services. This is mainly facilitated by the extensive computing resources provided by the cloud infrastructure. However, cloud-based AI processing suffers from drawbacks like high latency, huge network costs, data privacy/security concerns, and service disruptions due to internet outage. Edge computing for AI (edge-AI) addresses these problems by combining data sources with on-board AI processing hardware. Such hardware must be energy efficient to achieve prolonged operation, given the limited energy resources on edge devices. Moreover, it should be compact in size to facilitate seamless system integration and enhanced portability. Conventional hardware cannot meet these requirements due to data transfer bottleneck in von Neumann architecture and limitations of conventional memory technologies.
Computation-in-memory (CIM) overcomes these challenges by in-situ data processing using emerging memory technologies called memristors. Thus, CIM can facilitate energy efficient and compact edge-AI hardware design. Healthcare domain stands out as a prime target for CIM-based edge-AI hardware, due to two main reasons. Firstly, it holds significant real-world importance due to its direct impact on human well-being. Secondly, the increasing adoption of AI in healthcare can significantly benefit from efficient hardware for data processing. CIM-based edge hardware can greatly enhance the effectiveness of AI-based healthcare through rapid, reliable, and secure processing of medical data at its source. Hence, design of CIM-based edge-AI hardware for healthcare applications presents a promising research direction.

The process of designing CIM-based edge-AI hardware for healthcare can be expressed as a stack of six abstraction layers: application, algorithm, optimization, mapping, micro-architecture and circuits, and device. These abstraction layers can be further grouped into two distinct design phases. The first phase is application-dependent, covering the first three abstraction layers (application, algorithm and optimization). It involves creating a customized neural network model for the given healthcare application. The challenge in this phase is to achieve strong algorithmic performance, while incorporating features to exploit the full potential of CIM hardware. Conversely, the second phase is application-independent and comprises of the remaining abstraction layers (mapping, micro-architecture and circuits, and device). It solely focuses on translating the model computations into CIM hardware operations. However, the non-ideal characteristics of memristor devices introduce computational errors in hardware operations. This undermines the advantages of CIM as energy-efficient computations are of no use if they are incorrect. Hence, mitigating memristor non-idealities becomes the primary challenge in this phase. Moreover, it is important to integrate the customized model and non-ideality mitigation strategies into a comprehensive hardware solution and realize it through prototyping. This gives rise to the following three research topics: 1) healthcare AI models for CIM-based edge hardware, 2) dealing with memristor non-idealities, and 3) CIM edge-AI prototyping for healthcare.

We adopt a cross-layer approach in this thesis to address these research topics, covering all six layers of the CIM abstraction stack. We begin by creating neural network models for two healthcare applications: cardiac arrhythmia classification and diabetic retinopathy screening. Our contributions in this application-dependent design phase span across the first three abstraction layers (application, algorithm and optimization). At the application layer, we introduce new features in the model tailored to the specific healthcare application. This enhances its real-world impact by addressing the unique medical needs more effectively. Moving to the algorithm layer, we customize the computational flow within the model to exploit the characteristics of the healthcare data. This improves design performance in key aspects like accuracy and energy efficiency. Moreover, we strategically refine the model computations to further maximize post-deployment benefits on CIM hardware. At the optimization layer, we employ techniques like resampling, quantization and pruning to optimize hardware resource requirements, without compromising the model's algorithmic performance.

After creating the neural network models, we proceed to the application-independent design phase. Focusing on RRAM-based memristor devices, we first identify three key non-idealities that significantly impact inference accuracy on CIM hardware. We then devise mitigation strategies against these non-idealities, encompassing the remaining abstraction layers (mapping, micro-architecture and circuits, and device). At mapping layer, we propose a hardware-aware training methodology to combat the conductance variation non-ideality. Moving to the micro-architecture level, we present two mitigation strategies. The first addresses non-zero Gmin error non-ideality through a novel approach to CIM micro-architecture design. The second introduces an adaptive micro-architecture that adjusts its sensing conditions to counteract the effects of read-disturb non-ideality. At the device level, these strategies indirectly contribute by circumventing the necessity for extensive device engineering, ensuring accurate inference even in the presence of non-idealities. Building upon this foundation of model development and non-ideality mitigation, we integrate the optimal ECG classification model with the proposed mitigation strategies to create a CIM edge-AI prototype. Thus, our contributions pave the way towards a future with enhanced effectiveness and efficiency of AI-powered healthcare. ...
Computation-in-memory (CIM) using memristors can facilitate data processing within the memory itself, leading to superior energy efficiency than conventional von-Neumann architecture. This makes CIM well-suited for data-intensive applications like neural networks. However, a large number of read operations can induce an undesired resistance change in the memristor, known as read-disturb. As memristor resistances represent the neural network weights in CIM hardware, read-disturb causes an unintended change in the network’s weights that leads to poor accuracy. In this paper, we propose a methodology for read-disturb detection and mitigation in CIM-based neural networks. We first analyze the key insights regarding the read-disturb phenomenon. We then introduce a mechanism to dynamically detect the occurrence of read-disturb in CIM-based neural networks. In response to such detections, we develop a method that adapts the sensing conditions of CIM hardware to provide error-free operation even in the presence of read-disturb. Simulation results show that our proposed methodology achieves up to 2× accuracy and up to 2× correct operations per unit energy compared to conventional CIM architectures. ...
Journal article (2024) - Sumit Diware, Koteswararao Chilakala, Rajiv V. Joshi, Said Hamdioui, Rajendra Bishnoi
Diabetic retinopathy (DR) is a leading cause of permanent vision loss worldwide. It refers to irreversible retinal damage caused due to elevated glucose levels and blood pressure. Regular screening for DR can facilitate its early detection and timely treatment. Neural network-based DR classifiers can be leveraged to achieve such screening in a convenient and automated manner. However, these classifiers suffer from reliability issue where they exhibit strong performance during development but degraded performance after deployment. Moreover, they do not provide supplementary information about the prediction outcome, which severely limits their widespread adoption. Furthermore, energy-efficient deployment of these classifiers on edge devices remains unaddressed, which is crucial to enhance their global accessibility. In this paper, we present a reliable and energy-efficient hardware for DR detection, suitable for deployment on edge devices. We first develop a DR classification model using custom training data that incorporates diverse image quality and image sources along with improved class balance. This enables our model to effectively handle both on-field variations in retinal images and minority DR classes, enhancing its post-deployment reliability. We then propose a pseudo-binary classification scheme to further improve the model performance and provide supplementary information about the model prediction. Additionally, we present an energy-efficient hardware design for our model using memristor-based computation-in-memory, to facilitate its deployment on edge devices. Our proposed approach achieves reliable DR classification with three orders of magnitude reduction in energy consumption over state-of-the-art hardware platforms. ...
Memristor-based computation-in-memory (CIM) can achieve high energy efficiency by processing the data within the memory, which makes it well-suited for applications like neural networks. However, memristors suffer from conductance variation problem where their programmed conductance values deviate from the desired values. Such variations lead to computational errors that result in degraded inference accuracy in CIM-based neural networks. In this paper, we present a mapping-aware biased training methodology to mitigate the impact of conductance variation on CIM-based neural networks. We first determine which conductance states of the memristor are inherently more immune to variation. The neural network is then trained under the constraint that important weights can only take numeric values which directly get mapped to such favorable states. Simulation results show that our proposed mapping-aware biased training achieves up to 2.4× hardware accuracy compared to the conventional training. ...
Resistive random access memory (RRAM) based computation-in-memory (CIM) architectures can meet the unprecedented energy efficiency requirements to execute AI algorithms directly on edge devices. However, the read-disturb problem associated with these architectures can lead to accumulated computational errors. To achieve the necessary level of computational accuracy, after a specific number of read cycles, these devices must undergo a reprogramming process which is a static approach and needs a large counter. This paper proposes a circuit-level RRAM read-disturb detection technique by monitoring real-time conductance drifts of RRAM devices, which initiate the reprogramming when actually it needs. Moreover, an analytic method is presented to determine the minimum conductance detection requirements, and our proposed read-disturb detection technique is tuned for the same to detect it dynamically. SPICE simulation result using TSMC 40 nm shows the correct functionality of our proposed detection technique. ...
Journal article (2023) - Sumit Diware, Abhairaj Singh, Anteneh Gebregiorgis, Rajiv V. Joshi, Said Hamdioui, Rajendra Bishnoi
Computation-in-memory (CIM) paradigm leverages emerging memory technologies such as resistive random access memories (RRAMs) to process the data within the memory itself. This alleviates the memory-processor bottleneck resulting in much higher hardware efficiency compared to von-Neumann architecture-based conventional hardware. Hence, CIM becomes an attractive alternative for applications like neural networks which require a huge number of data transfer operations in conventional hardware. CIM-based neural networks typically employ bit-slicing scheme which represents a single neural weight using multiple RRAM devices (called slices) to meet the high bit-precision demand. However, such neural networks suffer from significant accuracy degradation due to non-zero Gmin error where a zero weight in the neural network is represented by an RRAM device with a non-zero conductance. This paper proposes an unbalanced bit-slicing scheme to mitigate the impact of non-zero Gmin error. It achieves this by allocating appropriate sensing margins for different slices based on their binary positions. It also tunes the sensing margins to meet the demands of either high accuracy or energy-efficiency. The sensing margin allocation is supported by 2's complement arithmetic which further reduces the influence of non-zero Gmin error. Simulation results show that our proposed scheme achieves up to 7.3× accuracy and up to 7.8× correct operations per unit energy consumption compared to state-of-the-art. ...
Journal article (2023) - S.S. Diware, Sudeshna Dash, A.B. Gebregiorgis, Rajiv V. Joshi, C. Strydis, S. Hamdioui, R.K. Bishnoi
Timely detection of cardiac arrhythmia characterized by abnormal heartbeats can help in the early diagnosis and treatment of cardiovascular diseases. Wearable healthcare devices typically use neural networks to provide the most convenient way of continuously monitoring heart activity for arrhythmia detection. However, it is challenging to achieve high accuracy and energy efficiency in these smart wearable healthcare devices. In this work, we provide architecture-level solutions to deploy neural networks for cardiac arrhythmia classification. We have created a hierarchical structure after analyzing various neural network topologies where only required network components are activated to improve energy efficiency while maintaining high accuracy. In our proposed architecture, we introduce a severity-based classification approach to directly help the users of the wearable healthcare device as well as the medical professionals. Additionally, we have employed computation-in-memory based hardware to improve energy efficiency and area consumption by leveraging in-situ data processing and scalability of emerging memory technologies such as resistive random access memory (RRAM). Simulation experiments conducted using the MIT-BIH arrhythmia dataset show that the proposed architecture provides high accuracy while consuming average energy of 0.11 $\mu$J per heartbeat classification and 0.11 mm2 area, thereby achieving 25× improvement in average energy consumption and 12× improvement in area compared to the state-of-the-art. ...
Analog computation-in-memory (CIM) architecture alleviates massive data movement between the memory and the processor, thus promising great prospects to accelerate certain computational tasks in an energy-efficient manner. However, data converters involved in these architectures typically achieve the required computing accuracy at the expense of high area and energy footprint which can potentially determine CIM candidacy for low-power and compact edge-AI devices. In this work, we present a memory-periphery co-design to perform accurate A/D conversions of analog matrix-vector-multiplication (MVM) outputs. Here, we introduce a scheme where select-lines and bit-lines in the memory are virtually fixed to improve conversion accuracy and aid a ring-oscillator-based A/D conversion, equipped with component sharing and inter-matching of the reference blocks. In addition, we deploy a self-timed technique to further ensure high robustness addressing global design and cycle-to-cycle variations. Based on measurement results of a 4Kb CIM chip prototype equipped with TSMC 40nm, a relative accuracy of up to 99.71% is achieved with an energy efficiency of 115.1 TOPS/W and computational density of 12.1 TOPS/mm2 for the MNIST dataset. Thus, an improvement of up to 11.3X and 7.5X compared to the state-of-the-art, respectively. ...
Conference paper (2023) - Hassen Aziza, Cristian Zambelli, Said Hamdioui, Sumit Diware, Rajendra Bishnoi, Anteneh Gebregiorgis
Emerging device technologies such as Resistive RAMs (RRAMs) are under investigation by many researchers and semiconductor companies; not only to realize e.g., embedded non-volatile memories, but also to enable energy-efficient computing making use of new data processing paradigms such as computation-in-memory. However, such devices suffer from various non-idealities and reliability failure mechanisms (e.g., variability, endurance, and retention); these negatively impact the memory robustness and the computation accuracy. This paper discusses the non-idealities and reliability failure mechanisms for RRAM devices, provides an overview on the most popular ones. In addition, it reports detailed anlysis of some of these based on data measurements. Finally, it presents two different mitigation schemes for RRAM based accelerators; one is based on RRAM non-ideality aware quantization and conductance control for neural network accuracy enhancement while the second is based on reliability-aware biased training technique. ...
Conference paper (2023) - Rajendra Bishnoi, Sumit Diware, Hussam Amrouch, Said Hamdioui, Anteneh Gebregiorgis, Simon Thomann, Sara Mannaa, Bastien Deveautour, Cedric Marchand, Alberto Bosio, Damien Deleruyelle, Ian O'Connor
Deep Learning (DL) has recently led to remark-able advancements, however, it faces severe computation related challenges. Existing Von-Neumann-based solutions are dealing with issues such as memory bandwidth limitations and energy inefficiency. Computation-In-Memory (CIM) has the potential to address this problem by integrating processing elements directly into the memory architecture, reducing data movement and enhancing the overall efficiency of the system. In this work, we propose CIM architecture using three distinct emerging technologies. Firstly, a CIM architecture utilizing Ferroelectric Field-Effect Transistors (FeFET) is shown and the resulting errors from the analog compute scheme are injected into the emerging algorithm of Hyperdimensional Computing. Subsequently, we explore Vertical Nanowire Field-Effect Transistors (VNWFETs) based CIM within a 3D computing architecture, demonstrating improved energy efficiency and reconfigurability for CIM. Additionally, we improve the accuracy of the Resistive Random Access Memories (RRAM) based CIM architecture using two mapping-based solutions. These three technologies exhibit non-volatile characteristics, and when integrated into the CIM architecture, they yield significant advantages, including enhanced energy efficiency, reliability, and accuracy in computing processes. ...
Computation-In-Memory (CIM) using memristor devices provides an energy-efficient hardware implementation of arithmetic and logic operations for numerous applications, such as neuromorphic computing and database query. However, memristor-based CIM suffers from various non-idealities such as conductance drift, read disturb, wire parasitics, endurance and device degradation. These negatively impact the computation accuracy of CIM. It is therefore essential to deal with these non-idealities and fabrication imperfections in order to harness the full potential of CIM. This paper discusses the non-ideality challenges and provides potential solutions. Furthermore, the paper outlines the potential future directions for CIM architectures. ...
Conference paper (2021) - Abhairaj Singh, Sumit Diware, Anteneh Gebregiorgis, Rajendra Bishnoi, Francky Catthoor, Rajiv V. Joshi, Said Hamdioui
With the rise of the Internet of Things (IoT), a huge market for so-called smart edge-devices is foreseen for millions of applications, like personalized healthcare and smart robotics. These devices have to bring smart computing directly where the data is generated, while coping with the limited energy budget. Conventional von-Neumann architecture fail to meet these requirements due to e.g., memory-processor data transfer bottleneck. Memristor-based computation-in-memory (CIM) has the potential to realize smart local computing for highly parallel data-dominated AI applications by exploiting the inherent properties of the architecture and the physical characteristics of the memristors. This paper provides a broad overview of CIM architecture highlighting its potential and unique properties in enabling smart local computing. Moreover, it discusses design considerations of such architectures including both crossbar array as well as peripheral circuits; special attention is given to analog-to-digital converter (ADC), as it is the most critical unit of analog-based CIM operation e.g., vector-matrix multiplication (VMM). Finally, the paper outlines the potential future directions for CIM-based edge smart computing. ...
Conference paper (2021) - Sumit Diware, Anteneh Gebregiorgis, Rajiv V. Joshi, Said Hamdioui, Rajendra Bishnoi
Emerging memristor-based computing has the potential to achieve higher computational efficiency over conventional architectures. Bit-slicing scheme, which represents a single neural weight using multiple memristive devices, is usually introduced in memristor-based neural networks to meet high bit-precision demands. However, the accuracy of such networks can be significantly degraded due to non-zero minimum conductance $(\mathrm{G}_{min})$ of memristive devices. This paper proposes an unbalanced bit-slicing scheme; it uses smaller slice sizes for more important bits to provide higher sensing margin and reduces the impact of non-zero $\mathrm{G}_{min}$. Moreover, the unbalanced bit-slicing is assisted by 2’s complement arithmetic which further improves the accuracy. Simulation results show that our proposed scheme can achieve up to $8.8 \times $ and $1.8 \times $ accuracy compared to state-of-the-art for single-bit and two-bit configurations respectively, at reasonable energy overheads. ...