NVM-in-Cache: Repurposing Commodity 6T SRAM Cache into NVM Analog Processing-in-Memory Engine using a Novel Compute-on-Powerline Scheme

📅 2025-09-15
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high SRAM area overhead and energy inefficiency arising from the memory–compute separation in DNN acceleration, this work proposes a 6T-2R hybrid bit-cell architecture: two RRAM devices are embedded into a standard 6T SRAM cell without additional area cost, enabling the first native integration of commercial SRAM caches with analog processing-in-memory (PIM). We introduce the Compute-on-Powerline mechanism, leveraging the power rail as a parallel analog-domain data path for multiply-accumulate (MAC) operations—simultaneously supporting cache functionality and high-throughput AI computation. Fabricated in GlobalFoundries 22 nm FDSOI technology, the design achieves 0.4 TOPS computational throughput and 491.78 TOPS/W energy efficiency. Under 128-row parallelism, it successfully executes ResNet-18 on CIFAR-10 with 91.27% accuracy.

Technology Category

Application Category

📝 Abstract
The rapid growth of deep neural network (DNN) workloads has significantly increased the demand for large-capacity on-chip SRAM in machine learning (ML) applications, with SRAM arrays now occupying a substantial fraction of the total die area. To address the dual challenges of storage density and computation efficiency, this paper proposes an NVM-in-Cache architecture that integrates resistive RAM (RRAM) devices into a conventional 6T-SRAM cell, forming a compact 6T-2R bit-cell. This hybrid cell enables Processing-in-Memory (PIM) mode, which performs massively parallel multiply-and-accumulate (MAC) operations directly on cache power lines while preserving stored cache data. By exploiting the intrinsic properties of the 6T-2R structure, the architecture achieves additional storage capability, high computational throughput without any bit-cell area overhead. Circuit- and array-level simulations in GlobalFoundries 22nm FDSOI technology demonstrate that the proposed design achieves a throughput of 0.4 TOPS and 491.78 TOPS/W. For 128 row-parallel operations, the CIFAR-10 classification is demonstrated by mapping a Resnet-18 neural network, achieving an accuracy of 91.27%. These results highlight the potential of the NVM-in-Cache approach to serve as a scalable, energy-efficient computing method by re-purposing existing 6T SRAM cache architecture for next-generation AI accelerators and general purpose processors.
Problem

Research questions and friction points this paper is trying to address.

Integrates RRAM into 6T-SRAM to form a hybrid 6T-2R bit-cell
Enables Processing-in-Memory for parallel MAC operations on cache power lines
Achieves high throughput and energy efficiency without area overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates RRAM into 6T-SRAM to form 6T-2R bit-cell
Performs MAC operations directly on cache power lines
Enables processing-in-memory without bit-cell area overhead
🔎 Similar Papers
No similar papers found.
S
Subhradip Chakraborty
Department of Electrical and Computer Engineering, University of Wisconsin Madison, Madison, USA
Ankur Singh
Ankur Singh
Applied Scientist at Adobe
Computer VisionDeep LearningGenerative AIMachine LearningImage Processing
X
Xuming Chen
Department of Electrical, Computer, and Systems Engineering, Case School of Engineering, Case Western Reserve University, USA
G
G. Datta
Department of Electrical, Computer, and Systems Engineering, Case School of Engineering, Case Western Reserve University, USA
A
Akhilesh R. Jaiswal
Department of Electrical and Computer Engineering, University of Wisconsin Madison, Madison, USA