extit{No One-Size-Fits-All}: A Workload-Driven Characterization of Bit-Parallel vs. Bit-Serial Data Layouts for Processing-using-Memory

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The PIM community has long assumed interchangeability between bit-parallel (BP) and bit-serial (BS) data layouts, lacking workload-aware, systematic criteria for layout selection. Method: This work introduces the first workload-characteristic-driven data layout selection methodology, employing area-equivalent, cycle-accurate architectural modeling and comprehensive evaluation via MIMDRAM microbenchmarks and PIMBench application workloads. Contribution/Results: We identify the performance boundary between BP and BS layouts: BP excels in control-flow-intensive workloads, whereas BS significantly outperforms BP in low-precision AI computations. Our findings refute the implicit assumption of a universally optimal layout, establishing a theoretical foundation and practical framework for hybrid layout design. Crucially, we demonstrate that no single layout is optimal across diverse workloads—layout selection must be workload-specific.

Technology Category

Application Category

📝 Abstract
Processing-in-Memory (PIM) is a promising approach to overcoming the memory-wall bottleneck. However, the PIM community has largely treated its two fundamental data layouts, Bit-Parallel (BP) and Bit-Serial (BS), as if they were interchangeable. This implicit "one-layout-fits-all" assumption, often hard-coded into existing evaluation frameworks, creates a critical gap: architects lack systematic, workload-driven guidelines for choosing the optimal data layout for their target applications. To address this gap, this paper presents the first systematic, workload-driven characterization of BP and BS PIM architectures. We develop iso-area, cycle-accurate BP and BS PIM architectural models and conduct a comprehensive evaluation using a diverse set of benchmarks. Our suite includes both fine-grained microworkloads from MIMDRAM to isolate specific operational characteristics, and large-scale applications from the PIMBench suite, such as the VGG network, to represent realistic end-to-end workloads. Our results quantitatively demonstrate that no single layout is universally superior; the optimal choice is strongly dependent on workload characteristics. BP excels on control-flow-intensive tasks with irregular memory access patterns, whereas BS shows substantial advantages in massively parallel, low-precision (e.g., INT4/INT8) computations common in AI. Based on this characterization, we distill a set of actionable design guidelines for architects. This work challenges the prevailing one-size-fits-all view on PIM data layouts and provides a principled foundation for designing next-generation, workload-aware, and potentially hybrid PIM systems.
Problem

Research questions and friction points this paper is trying to address.

Characterizing Bit-Parallel versus Bit-Serial PIM layouts for optimal workload performance
Addressing the lack of systematic guidelines for choosing PIM data layouts
Determining workload-dependent superiority between control-intensive and parallel computation layouts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Characterizes Bit-Parallel vs Bit-Serial PIM layouts systematically
Develops iso-area cycle-accurate models for workload-driven evaluation
Provides guidelines for workload-aware and hybrid PIM system design
🔎 Similar Papers
No similar papers found.