Commercial Evaluation of Zero-Skipping MAC Design for Bit Sparsity Exploitation in DL Inference

📅 2024-02-29
🏛️ IEEE/IFIP International Conference on Very Large Scale Integration of System-on-Chip
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address hardware energy-efficiency bottlenecks in deep learning inference, this work proposes and commercially validates the Zero-skipping Multiply-Accumulate unit (OzMAC), which dynamically exploits bit-level sparsity in both weights and activations. Implemented in TSMC N5 technology, OzMAC features a custom, frequency-scalable MAC architecture supporting INT4/8/16 multi-precision computation—the first post-synthesis commercial feasibility evaluation across multiple precisions and operating frequencies. Experimental results show that, at INT8 precision, OzMAC reduces area by 21%, power consumption by 70%, and energy per operation by 28% versus a conventional baseline. While maintaining throughput, it achieves a 30% improvement in end-to-end energy efficiency—surpassing the theoretical efficiency ceiling of binary MACs. This work demonstrates that high-bitwidth sparsity is not only prevalent but also highly exploitable in hardware, establishing a new paradigm for energy-efficient AI accelerators.

Technology Category

Application Category

📝 Abstract
General Matrix Multiply (GEMM) units, consisting of multiply-accumulate (MAC) arrays, perform bulk of the computation in deep learning (DL). Recent work has proposed a novel MAC design, Bit-Pragmatic (PRA), capable of dynamically exploiting bit sparsity. This work presents OzMAC (Omit-zero-MAC), a modified re-implementation of PRA, but extends beyond earlier works by performing rigorous post-synthesis evaluation against binary MAC design across multiple bitwidths and clock frequencies using TSMC N5 process node to assess commercial implementation potential. We demonstrate the existence of high bit sparsity in eight pretrained INT8 DL workloads and show that 8-bit OzMAC improves all three metrics of area, power, and energy significantly by 21%, 70%, and 28%, respectively. Similar improvements are achieved when scaling data precisions (4, 8, 16 bits) and clock frequencies (0.5 GHz, 1 GHz, 1.5 GHz). For the 8-bit OzMAC, scaling its frequency to normalize the throughput, it still achieves 30% improvement on both power and energy.
Problem

Research questions and friction points this paper is trying to address.

Zero-skipping MAC
Deep Learning Efficiency
Energy Consumption Reduction
Innovation

Methods, ideas, or system contributions that make the work stand out.

OzMAC
Zero-skipping MAC
Energy efficiency
🔎 Similar Papers
No similar papers found.
H
Harideep Nair
Carnegie Mellon University
P. Vellaisamy
P. Vellaisamy
Indian Institute of Technology Bombay
Statisticsfractional stochastic processapplied probability.
T
Tsung-Han Lin
MediaTek USA Inc.
P
Perry Wang
MediaTek USA Inc.
S
Shawn Blanton
Carnegie Mellon University
J
J. Shen
Carnegie Mellon University