Binary Weight Multi-Bit Activation Quantization for Compute-in-Memory CNN Accelerators

📅 2025-08-29

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the challenge of balancing accuracy and energy efficiency in CNN deployment on compute-in-memory (CIM) accelerators, this paper proposes a synergistic quantization scheme featuring binary weights and multi-bit activations. Our method comprises three key contributions: (1) deriving closed-form solutions for layer-wise weight binarization to maximize representational capacity of binary weights; (2) designing a differentiable activation quantization function that accurately approximates ideal multi-bit behavior without hyperparameter tuning; and (3) integrating end-to-end training with CIM hardware-aware simulation for realistic validation. Experimental results show accuracy improvements of 1.44–5.46% on CIFAR-10 and 0.35–5.37% on ImageNet over baseline methods. Hardware simulations demonstrate that 4-bit activations achieve an optimal trade-off between performance and area/energy cost. To the best of our knowledge, this is the first work enabling high-accuracy, efficient CNN deployment on CIM hardware using binary weights paired with multi-bit activations.

Technology Category

Application Category

📝 Abstract

Compute-in-memory (CIM) accelerators have emerged as a promising way for enhancing the energy efficiency of convolutional neural networks (CNNs). Deploying CNNs on CIM platforms generally requires quantization of network weights and activations to meet hardware constraints. However, existing approaches either prioritize hardware efficiency with binary weight and activation quantization at the cost of accuracy, or utilize multi-bit weights and activations for greater accuracy but limited efficiency. In this paper, we introduce a novel binary weight multi-bit activation (BWMA) method for CNNs on CIM-based accelerators. Our contributions include: deriving closed-form solutions for weight quantization in each layer, significantly improving the representational capabilities of binarized weights; and developing a differentiable function for activation quantization, approximating the ideal multi-bit function while bypassing the extensive search for optimal settings. Through comprehensive experiments on CIFAR-10 and ImageNet datasets, we show that BWMA achieves notable accuracy improvements over existing methods, registering gains of 1.44%-5.46% and 0.35%-5.37% on respective datasets. Moreover, hardware simulation results indicate that 4-bit activation quantization strikes the optimal balance between hardware cost and model performance.

Problem

Research questions and friction points this paper is trying to address.

Balancing accuracy and efficiency in CIM CNN quantization

Optimizing binary weight and multi-bit activation representations

Developing closed-form and differentiable quantization solutions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Binary weight quantization with closed-form solutions

Differentiable multi-bit activation quantization function

Optimal 4-bit activation for hardware efficiency

🔎 Similar Papers

No similar papers found.

Authors to Follow