🤖 AI Summary
Digital SRAM-based Processing-in-Memory (PIM) architectures struggle to jointly exploit value-level and bit-level sparsity, while static zero-value/zero-bit handling incurs significant energy overhead. To address this, we propose Dyadic Block PIM (DB-PIM), an algorithm-architecture co-design framework. DB-PIM introduces a novel joint pruning strategy unifying value- and bit-level sparsity, coupled with a dynamic block computation paradigm, mixed-granularity pruning, a Canonical Signed Digit (CSD) adder tree, an input preprocessing unit (IPU), a dyadic block multiplication unit (DBMU), and a custom sparse SRAM-PIM macro. Unlike conventional crossbar-based PIMs constrained by static zero-value masking, DB-PIM enables adaptive, fine-grained sparsity exploitation. Evaluated on benchmark workloads, it achieves up to 8.01× speedup and 85.28% energy reduction without accuracy loss, significantly enhancing both energy efficiency and computational flexibility of digital PIM systems.
📝 Abstract
Processing-in-memory (PIM) is a transformative architectural paradigm designed to overcome the Von Neumann bottleneck. Among PIM architectures, digital SRAM-PIM emerges as a promising solution, offering significant advantages by directly integrating digital logic within the SRAM array. However, rigid crossbar architecture and full array activation pose challenges in efficiently utilizing traditional value-level sparsity. Moreover, neural network models exhibit a high proportion of zero bits within non-zero values, which remain underutilized due to architectural constraints. To overcome these limitations, we present Dyadic Block PIM (DB-PIM), a groundbreaking algorithm-architecture co-design framework to harness both value-level and bit-level sparsity. At the algorithm level, our hybrid-grained pruning technique, combined with a novel sparsity pattern, enables effective sparsity management. Architecturally, DB-PIM incorporates a sparse network and customized digital SRAM-PIM macros, including input pre-processing unit (IPU), dyadic block multiply units (DBMUs), and Canonical Signed Digit (CSD)-based adder trees. It circumvents structured zero values in weights and bypasses unstructured zero bits within non-zero weights and block-wise all-zero bit columns in input features. As a result, the DB-PIM framework skips a majority of unnecessary computations, thereby driving significant gains in computational efficiency. Results demonstrate that our DB-PIM framework achieves up to 8.01x speedup and 85.28% energy savings, significantly boosting computational efficiency in digital SRAM-PIM systems.