Enhancing Parallelism and Energy-Efficiency in SOT-MRAM based CIM Architecture for On-Chip Learning

Conference Paper (2025)
Author(s)

A. Sehgal (Indian Institute of Technology Roorkee)

A. Kumar Shukla (Madan Mohan Malaviya University of Technology)

S. Diware (TU Delft - Computer Engineering, TU Delft - Programming Languages)

S. Soni (Indian Institute of Technology Roorkee)

S. Dhull (Global Foundaries)

S. Shreya (Aarhus University)

S. Roy (Indian Institute of Technology Roorkee)

R.K. Bishnoi (TU Delft - Computer Engineering)

Research Group
Programming Languages
DOI related publication
https://doi.org/10.1109/DAC63849.2025.11424425
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Programming Languages
Publisher
IEEE
ISBN (print)
979-8-3315-0305-5
ISBN (electronic)
979-8-3315-0304-8
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Computational-In-Memory (CIM) architectures have emerged as energy-efficient solutions for Artificial Intelligence (AI) applications, enabling data processing within memory arrays and reducing the bottleneck associated with data transfer. The rapid advancement of AI demands real-time on-chip learning but implementing this with CIM architectures poses significant challenges, such as limited parallelism and energy-efficiency during training and inference. In this paper, we propose a novel CIM architecture specifically designed for on-chip learning applications, which capitalizes on the unique properties of Spin-Orbit Torque (SOT) technology to enhance both parallelism and energy-efficiency in computation. The proposed architecture incorporates a bulk-write mechanism for SOT-cell based arrays, enabling efficient weight updates during on-chip training. Additionally, we develop a scheme to process vector elements concurrently for vector-matrix multiplications during inference. To achieve this, we design multi-port bit-cell access capabilities along with their associated control mechanisms. Simulation results show a $5.82 \times$ reduction in latency and a $3.20 \times$ improvement in energy-efficiency compared to standard SOT-MRAM based CIM, with negligible overhead.

Files

Taverne
warning

File under embargo until 10-09-2026