GranQ: Granular Zero-Shot Quantization with Unified Layer-Channel Awareness

📅 2025-03-24

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing zero-shot quantization (ZSQ) methods suffer from significant activation distortion and accuracy degradation at low bit-widths due to coarse-grained scaling. To address this, we propose a layer-channel joint-aware dynamic fine-grained quantization framework, introducing the first vectorized activation quantization mechanism. Our method models the intra-layer inter-channel joint distribution to adaptively determine optimal quantization granularity and jointly optimizes scaling factors and zero points—without any data—to minimize reconstruction error. Crucially, it requires neither training data nor fine-tuning. Extensive experiments demonstrate that our approach consistently outperforms state-of-the-art ZSQ methods across diverse architectures (e.g., ResNet, ViT) and tasks (e.g., image classification, object detection), achieving accuracy comparable to data-dependent quantization-aware training (QAT), while significantly improving both inference efficiency and deployment practicality.

Technology Category

Application Category

📝 Abstract

Zero-shot quantization (ZSQ) enables neural network compression without training data, which is crucial in restricted data access environments. However, existing ZSQ methods suffer from significant activation loss in low-bit environments owing to their coarse-grained scaling strategy. To address this issue, we propose GranQ, a novel ZSQ approach that leverages layer-channel awareness to minimize the quantization error. Unlike conventional layer- or channel-wise quantization, GranQ dynamically adjusts quantization granularity by considering both layer- and channel-level activation distributions. This enables fine-grained quantization while minimizing activation distortion. Additionally, we introduce vectorized activation quantization, which enables efficient parallel computation and reduces computational overhead while preserving accuracy. GranQ achieves superior performance compared with those of state-of-the-art ZSQ methods that employ quantization-aware training. With these findings, we anticipate that GranQ will inspire novel research directions beyond conventional ZSQ approaches focused on data generation and model training.

Problem

Research questions and friction points this paper is trying to address.

Minimizes quantization error in zero-shot quantization

Dynamically adjusts layer-channel quantization granularity

Enables efficient parallel computation with vectorized quantization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic layer-channel aware quantization adjustment

Vectorized activation quantization for efficiency

Fine-grained quantization minimizing activation distortion

🔎 Similar Papers

No similar papers found.