NVM-in-Cache: Repurposing Commodity 6T SRAM Cache into NVM Analog Processing-in-Memory Engine using a Novel Compute-on-Powerline Scheme

📅 2025-09-15

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the high SRAM area overhead and energy inefficiency arising from the memory–compute separation in DNN acceleration, this work proposes a 6T-2R hybrid bit-cell architecture: two RRAM devices are embedded into a standard 6T SRAM cell without additional area cost, enabling the first native integration of commercial SRAM caches with analog processing-in-memory (PIM). We introduce the Compute-on-Powerline mechanism, leveraging the power rail as a parallel analog-domain data path for multiply-accumulate (MAC) operations—simultaneously supporting cache functionality and high-throughput AI computation. Fabricated in GlobalFoundries 22 nm FDSOI technology, the design achieves 0.4 TOPS computational throughput and 491.78 TOPS/W energy efficiency. Under 128-row parallelism, it successfully executes ResNet-18 on CIFAR-10 with 91.27% accuracy.

Technology Category

Application Category

📝 Abstract

The rapid growth of deep neural network (DNN) workloads has significantly increased the demand for large-capacity on-chip SRAM in machine learning (ML) applications, with SRAM arrays now occupying a substantial fraction of the total die area. To address the dual challenges of storage density and computation efficiency, this paper proposes an NVM-in-Cache architecture that integrates resistive RAM (RRAM) devices into a conventional 6T-SRAM cell, forming a compact 6T-2R bit-cell. This hybrid cell enables Processing-in-Memory (PIM) mode, which performs massively parallel multiply-and-accumulate (MAC) operations directly on cache power lines while preserving stored cache data. By exploiting the intrinsic properties of the 6T-2R structure, the architecture achieves additional storage capability, high computational throughput without any bit-cell area overhead. Circuit- and array-level simulations in GlobalFoundries 22nm FDSOI technology demonstrate that the proposed design achieves a throughput of 0.4 TOPS and 491.78 TOPS/W. For 128 row-parallel operations, the CIFAR-10 classification is demonstrated by mapping a Resnet-18 neural network, achieving an accuracy of 91.27%. These results highlight the potential of the NVM-in-Cache approach to serve as a scalable, energy-efficient computing method by re-purposing existing 6T SRAM cache architecture for next-generation AI accelerators and general purpose processors.

Problem

Research questions and friction points this paper is trying to address.

Integrates RRAM into 6T-SRAM to form a hybrid 6T-2R bit-cell

Enables Processing-in-Memory for parallel MAC operations on cache power lines

Achieves high throughput and energy efficiency without area overhead

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates RRAM into 6T-SRAM to form 6T-2R bit-cell

Performs MAC operations directly on cache power lines

Enables processing-in-memory without bit-cell area overhead

🔎 Similar Papers

No similar papers found.

Authors to Follow