🤖 AI Summary
This work addresses the limitations of conventional CRAM, which suffers from gate-level errors induced by stochastic switching in MRAM and is constrained by sequential write operations, thereby failing to meet the reliability and scalability demands of deep neural network (DNN) acceleration. To overcome these challenges, the authors propose CRAM-ER, a fault-resilient architecture that integrates spintronic CRAM with CMOS adder trees to enable high-density, energy-efficient in-memory matrix-vector multiplication. The design further incorporates a co-optimized error-aware model fine-tuning strategy and a fine-grained error-correction mechanism, substantially enhancing fault tolerance. Evaluations on DNN benchmarks demonstrate near-lossless inference accuracy, two orders of magnitude reduction in CRAM latency, and superior energy efficiency and energy-delay product compared to CPU/GPU systems paired with high-bandwidth DRAM.
📝 Abstract
Deep neural networks (DNNs) have achieved state-of-the-art performance across diverse domains. However, typical Von Neumann compute paradigms face severe memory bottlenecks. Emerging near-memory and compute-in-memory approaches alleviate this but incur significant peripheral overhead. Computational Random Access Memory (CRAM) based on MRAM enables in-situ logic without peripheral overhead, offering a dense, energy-efficient solution. However, probabilistic MRAM switching induces gate-level errors that limit the scalability and reliability of CRAM for accelerating DNN. Moreover, the large number of sequential MRAM writes severely constrains CRAM throughput. To address these challenges, we propose an error-resilient CRAM (CRAM-ER) architecture for scalable in-memory matrix-vector multiplications (MVMs). Our error-aware hardware-software co-design framework leverages a hybrid spintronic-CRAM + CMOS adder-tree architecture to mitigate the impact of device-level errors, demonstrating MVM functionality with high area and energy efficiency. We further develop an error-aware model fine-tuning and fine-grained error correction for enhanced error resilience. Evaluations of the CMOS+spintronic hybrid architecture on DNN benchmarks show near-lossless accuracy while reducing CRAM latency by up to 2 orders of magnitude, outperforming CPU/GPU+high-bandwidth DRAM in both energy efficiency and energy-delay product.