pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup Tables

📅 2021-04-15
🏛️ Micro
📈 Citations: 36
Influential: 1
📄 PDF
🤖 AI Summary
To address performance and energy-efficiency bottlenecks arising from data movement between main memory and processors, this paper proposes pLUTo, a DRAM-based Processing-in-Memory (PiM) architecture. Its core innovation is the first integration of configurable lookup tables (LUTs) deeply embedded within DRAM arrays, enabling bitline-level analog-domain table lookups to replace general-purpose complex operations—such as multiplication, division, and exponentiation—with low-overhead memory reads, eliminating the need for additional logic circuitry. Through batched read optimization and dynamic LUT mapping, pLUTo achieves high parallelism and flexibility while incurring only 10.2%–23.1% area overhead. Evaluated on 11 real-world workloads, pLUTo delivers 713× and 1.2× speedup over optimized CPU and GPU baselines, respectively, and improves energy efficiency by 1855× and 39.5×, significantly outperforming state-of-the-art PiM architectures by 18.3×.
📝 Abstract
Data movement between the main memory and the processor is a key contributor to execution time and energy consumption in memory-intensive applications. This data movement bottleneck can be alleviated using Processing-in-Memory (PiM). One category of PiM is Processing-using-Memory (PuM), in which computation takes place inside the memory array by exploiting intrinsic analog properties of the memory device. PuM yields high performance and energy efficiency, but existing PuM techniques support a limited range of operations. As a result, current PuM architectures cannot efficiently perform some complex operations (e.g., multiplication, division, exponentiation) without large increases in chip area and design complexity. To overcome these limitations of existing PuM architectures, we introduce pLUTo (processing-using-memory with lookup table (LUT) operations), a DRAM-based PuM architecture that leverages the high storage density of DRAM to enable the massively parallel storing and querying of lookup tables (LUTs). The key idea of pLUTo is to replace complex operations with low-cost, bulk memory reads (i.e., LUT queries) instead of relying on complex extra logic. We evaluate pLUTo across 11 real-world workloads that showcase the limitations of prior PuM approaches and show that our solution outperforms optimized CPU and GPU base-lines by an average of $713 imes$ and $1.2 imes$, respectively, while simultaneously reducing energy consumption by an average of $1855 imes$ and $39.5 imes$. Across these workloads, pLUTo outperforms state-of-the-art PiM architectures by an average of $18.3 imes$. We also show that different versions of pLUTo provide different levels of flexibility and performance at different additional DRAM area overheads (between 10.2% and 23.1%). pLUTo’s source code and all scripts required to reproduce the results of this paper are openly and fully available at https://github.com/CMU-SAFARI/pLUTo.
Problem

Research questions and friction points this paper is trying to address.

Data Movement
In-Memory Computing
Energy Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

DRAM-based Computing
Lookup Table Technology
Energy Efficiency
🔎 Similar Papers
No similar papers found.