Enhancing Parallelism and Energy-Efficiency in SOT-MRAM based CIM Architecture for On-Chip Learning

Conference Paper (2025)
Author(s)

A. Sehgal (Indian Institute of Technology Roorkee)

A. Kumar Shukla (Madan Mohan Malaviya University of Technology)

S. Diware (TU Delft - Electrical Engineering, Mathematics and Computer Science, TU Delft - Electrical Engineering, Mathematics and Computer Science)

S. Soni (Indian Institute of Technology Roorkee)

S. Dhull (Global Foundaries)

S. Shreya (Aarhus University)

S. Roy (Indian Institute of Technology Roorkee)

R.K. Bishnoi (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group
Programming Languages
DOI related publication
https://doi.org/10.1109/DAC63849.2025.11424425 Final published version
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Programming Languages
Publisher
IEEE
ISBN (print)
979-8-3315-0305-5
ISBN (electronic)
979-8-3315-0304-8
Event
2025 62nd ACM/IEEE Design Automation Conference (DAC) (2025-06-22 - 2025-06-25), San Francisco, United States
Downloads counter
26
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Computational-In-Memory (CIM) architectures have emerged as energy-efficient solutions for Artificial Intelligence (AI) applications, enabling data processing within memory arrays and reducing the bottleneck associated with data transfer. The rapid advancement of AI demands real-time on-chip learning but implementing this with CIM architectures poses significant challenges, such as limited parallelism and energy-efficiency during training and inference. In this paper, we propose a novel CIM architecture specifically designed for on-chip learning applications, which capitalizes on the unique properties of Spin-Orbit Torque (SOT) technology to enhance both parallelism and energy-efficiency in computation. The proposed architecture incorporates a bulk-write mechanism for SOT-cell based arrays, enabling efficient weight updates during on-chip training. Additionally, we develop a scheme to process vector elements concurrently for vector-matrix multiplications during inference. To achieve this, we design multi-port bit-cell access capabilities along with their associated control mechanisms. Simulation results show a $5.82 \times$ reduction in latency and a $3.20 \times$ improvement in energy-efficiency compared to standard SOT-MRAM based CIM, with negligible overhead.

Files

Taverne
warning

File under embargo until 10-09-2026