🤖 AI Summary
To address the high data movement overhead, latency, and low energy efficiency caused by the von Neumann bottleneck, this paper proposes a novel compute-in-memory (CIM) paradigm tailored for commercial DRAM. It achieves, for the first time, full-functionality and programmability of bit-level in-memory computation on unmodified, mass-produced DRAM chips. We introduce a lightweight circuit reconfiguration scheme enabling analog-domain computation at the bitline level and fine-grained memory access. Furthermore, we design a compute-memory co-optimized microarchitecture and a hardware-software co-designed programming model. Experimental evaluation on real DRAM demonstrates 10–100× improvement in computational energy efficiency and 30–70% reduction in memory access latency. This work establishes a deployable hardware foundation for CIM systems and provides a complete, end-to-end software stack design methodology—from architecture to programming—enabling practical adoption of CIM in commodity memory technologies.
📝 Abstract
Memory-centric computing aims to enable computation capability in and near all places where data is generated and stored. As such, it can greatly reduce the large negative performance and energy impact of data access and data movement, by 1) fundamentally avoiding data movement, 2) reducing data access latency&energy, and 3) exploiting large parallelism of memory arrays. Many recent studies show that memory-centric computing can largely improve system performance&energy efficiency. Major industrial vendors and startup companies have recently introduced memory chips with sophisticated computation capabilities. Going forward, both hardware and software stack should be revisited and designed carefully to take advantage of memory-centric computing. This work describes several major recent advances in memory-centric computing, specifically in Processing-in-DRAM, a paradigm where the operational characteristics of a DRAM chip are exploited and enhanced to perform computation on data stored in DRAM. Specifically, we describe 1) new techniques that slightly modify DRAM chips to enable both enhanced computation capability and easier programmability, 2) new experimental studies that demonstrate the functionally-complete bulk-bitwise computational capability of real commercial off-the-shelf DRAM chips, without any modifications to the DRAM chip or the interface, and 3) new DRAM designs that improve access granularity&efficiency, unleashing the true potential of Processing-in-DRAM.