Binary Weight Multi-Bit Activation Quantization for Compute-in-Memory CNN Accelerators

📅 2025-08-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of balancing accuracy and energy efficiency in CNN deployment on compute-in-memory (CIM) accelerators, this paper proposes a synergistic quantization scheme featuring binary weights and multi-bit activations. Our method comprises three key contributions: (1) deriving closed-form solutions for layer-wise weight binarization to maximize representational capacity of binary weights; (2) designing a differentiable activation quantization function that accurately approximates ideal multi-bit behavior without hyperparameter tuning; and (3) integrating end-to-end training with CIM hardware-aware simulation for realistic validation. Experimental results show accuracy improvements of 1.44–5.46% on CIFAR-10 and 0.35–5.37% on ImageNet over baseline methods. Hardware simulations demonstrate that 4-bit activations achieve an optimal trade-off between performance and area/energy cost. To the best of our knowledge, this is the first work enabling high-accuracy, efficient CNN deployment on CIM hardware using binary weights paired with multi-bit activations.

Technology Category

Application Category

📝 Abstract
Compute-in-memory (CIM) accelerators have emerged as a promising way for enhancing the energy efficiency of convolutional neural networks (CNNs). Deploying CNNs on CIM platforms generally requires quantization of network weights and activations to meet hardware constraints. However, existing approaches either prioritize hardware efficiency with binary weight and activation quantization at the cost of accuracy, or utilize multi-bit weights and activations for greater accuracy but limited efficiency. In this paper, we introduce a novel binary weight multi-bit activation (BWMA) method for CNNs on CIM-based accelerators. Our contributions include: deriving closed-form solutions for weight quantization in each layer, significantly improving the representational capabilities of binarized weights; and developing a differentiable function for activation quantization, approximating the ideal multi-bit function while bypassing the extensive search for optimal settings. Through comprehensive experiments on CIFAR-10 and ImageNet datasets, we show that BWMA achieves notable accuracy improvements over existing methods, registering gains of 1.44%-5.46% and 0.35%-5.37% on respective datasets. Moreover, hardware simulation results indicate that 4-bit activation quantization strikes the optimal balance between hardware cost and model performance.
Problem

Research questions and friction points this paper is trying to address.

Balancing accuracy and efficiency in CIM CNN quantization
Optimizing binary weight and multi-bit activation representations
Developing closed-form and differentiable quantization solutions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Binary weight quantization with closed-form solutions
Differentiable multi-bit activation quantization function
Optimal 4-bit activation for hardware efficiency
🔎 Similar Papers
No similar papers found.
Wenyong Zhou
Wenyong Zhou
The University of Hong Kong
Computer Vision
Zhengwu Liu
Zhengwu Liu
The University of Hong Kong (HKU) / Tsinghua University (THU)
brain machine interfacescomputing in memorymemristor
Y
Yuan Ren
Department of Electrical and Electronic Engineering, The University of Hong Kong
N
Ngai Wong
Department of Electrical and Electronic Engineering, The University of Hong Kong