Adaptive Distribution-aware Quantization for Mixed-Precision Neural Networks

๐Ÿ“… 2025-10-22
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address two key challenges in quantization-aware training (QAT)โ€”non-uniform activation distributions and static weight quantization codebooks ill-suited to dynamic parameter shiftsโ€”this paper proposes the Adaptive Distribution-aware Quantization (ADQ) framework. ADQ jointly enables dynamic distribution alignment and hardware-efficient low-bit quantization via three core components: quantile-based codebook initialization, an online codebook adaptation mechanism leveraging exponential moving average, and a sensitivity-driven mixed-precision allocation strategy. The method integrates non-uniform-to-uniform mapping, online distribution modeling, and co-optimization of bit-width allocation with layer-wise sensitivity. On ImageNet, ResNet-18 quantized by ADQ achieves 71.512% Top-1 accuracy at an average bit-width of 2.81, substantially outperforming prior state-of-the-art methods. Ablation studies confirm the effectiveness of each component. The primary contribution lies in the first unified, lightweight QAT framework that simultaneously incorporates distribution adaptivity, online codebook updating, and sensitivity-aware precision assignment.

Technology Category

Application Category

๐Ÿ“ Abstract
Quantization-Aware Training (QAT) is a critical technique for deploying deep neural networks on resource-constrained devices. However, existing methods often face two major challenges: the highly non-uniform distribution of activations and the static, mismatched codebooks used in weight quantization. To address these challenges, we propose Adaptive Distribution-aware Quantization (ADQ), a mixed-precision quantization framework that employs a differentiated strategy. The core of ADQ is a novel adaptive weight quantization scheme comprising three key innovations: (1) a quantile-based initialization method that constructs a codebook closely aligned with the initial weight distribution; (2) an online codebook adaptation mechanism based on Exponential Moving Average (EMA) to dynamically track distributional shifts; and (3) a sensitivity-informed strategy for mixed-precision allocation. For activations, we integrate a hardware-friendly non-uniform-to-uniform mapping scheme. Comprehensive experiments validate the effectiveness of our method. On ImageNet, ADQ enables a ResNet-18 to achieve 71.512% Top-1 accuracy with an average bit-width of only 2.81 bits, outperforming state-of-the-art methods under comparable conditions. Furthermore, detailed ablation studies on CIFAR-10 systematically demonstrate the individual contributions of each innovative component, validating the rationale and effectiveness of our design.
Problem

Research questions and friction points this paper is trying to address.

Addresses non-uniform activation distributions in neural network quantization
Solves static mismatched codebook issues in weight quantization
Proposes adaptive mixed-precision framework for resource-constrained deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive weight quantization with quantile-based codebook initialization
Online codebook adaptation using Exponential Moving Average tracking
Mixed-precision allocation guided by layer-wise sensitivity analysis
๐Ÿ”Ž Similar Papers
No similar papers found.
S
Shaohang Jia
School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China
Zhiyong Huang
Zhiyong Huang
Associate Professor, Department of Computer Science, School of Computing, NUS
Machine LearningComputer GraphicsComputer VisionMultimediaDatabases
Z
Zhi Yu
School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China
M
Mingyang Hou
School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China
S
Shuai Miao
School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China
H
Han Yang
School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China