🤖 AI Summary
To address the challenge of balancing accuracy and energy efficiency in CNN deployment on compute-in-memory (CIM) accelerators, this paper proposes a synergistic quantization scheme featuring binary weights and multi-bit activations. Our method comprises three key contributions: (1) deriving closed-form solutions for layer-wise weight binarization to maximize representational capacity of binary weights; (2) designing a differentiable activation quantization function that accurately approximates ideal multi-bit behavior without hyperparameter tuning; and (3) integrating end-to-end training with CIM hardware-aware simulation for realistic validation. Experimental results show accuracy improvements of 1.44–5.46% on CIFAR-10 and 0.35–5.37% on ImageNet over baseline methods. Hardware simulations demonstrate that 4-bit activations achieve an optimal trade-off between performance and area/energy cost. To the best of our knowledge, this is the first work enabling high-accuracy, efficient CNN deployment on CIM hardware using binary weights paired with multi-bit activations.
📝 Abstract
Compute-in-memory (CIM) accelerators have emerged as a promising way for enhancing the energy efficiency of convolutional neural networks (CNNs). Deploying CNNs on CIM platforms generally requires quantization of network weights and activations to meet hardware constraints. However, existing approaches either prioritize hardware efficiency with binary weight and activation quantization at the cost of accuracy, or utilize multi-bit weights and activations for greater accuracy but limited efficiency. In this paper, we introduce a novel binary weight multi-bit activation (BWMA) method for CNNs on CIM-based accelerators. Our contributions include: deriving closed-form solutions for weight quantization in each layer, significantly improving the representational capabilities of binarized weights; and developing a differentiable function for activation quantization, approximating the ideal multi-bit function while bypassing the extensive search for optimal settings. Through comprehensive experiments on CIFAR-10 and ImageNet datasets, we show that BWMA achieves notable accuracy improvements over existing methods, registering gains of 1.44%-5.46% and 0.35%-5.37% on respective datasets. Moreover, hardware simulation results indicate that 4-bit activation quantization strikes the optimal balance between hardware cost and model performance.