MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity

πŸ“… 2024-07-29
πŸ›οΈ Proceedings of the AAAI Conference on Artificial Intelligence
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Visual Transformers (ViTs) suffer severe accuracy degradation under low-bit data-free quantization (DFQ), primarily due to misalignment of multi-head attention maps induced by synthetic data. Method: This work proposes a cross-head attention similarity enhancement framework that jointly optimizes synthetic data generation and quantized network calibration. Specifically, it introduces head-level attention alignment to guide high-fidelity synthetic sample generation, and designs structured attention distillation to match the distributional characteristics across attention headsβ€”without access to real data. Contribution/Results: On ImageNet, the proposed method achieves only a 1.2% Top-1 accuracy drop for 4-bit quantized DeiT-T, substantially outperforming existing DFQ approaches and establishing new state-of-the-art performance.

Technology Category

Application Category

πŸ“ Abstract
Data-free quantization (DFQ) is a technique that creates a lightweight network from its full-precision counterpart without the original training data, often through a synthetic dataset. Although several DFQ methods have been proposed for vision transformer (ViT) architectures, they fail to achieve efficacy in low-bit settings. Examining the existing methods, we observe that their synthetic data produce misaligned attention maps, while those of the real samples are highly aligned. From this observation, we find that aligning attention maps of synthetic data helps improve the overall performance of quantized ViTs. Motivated by this finding, we devise MimiQ, a novel DFQ method designed for ViTs that enhances inter-head attention similarity. First, we generate synthetic data by aligning head-wise attention outputs from each spatial query patch. Then, we align the attention maps of the quantized network to those of the full-precision teacher by applying head-wise structural attention distillation. The experimental results show that the proposed method significantly outperforms baselines, setting a new state-of-the-art for ViT-DFQ.
Problem

Research questions and friction points this paper is trying to address.

Enhancing low-bit data-free quantization for Vision Transformers
Aligning synthetic data attention maps with real samples
Improving inter-head attention similarity in quantized ViTs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns synthetic data attention maps
Uses head-wise structural distillation
Enhances inter-head attention similarity
K
Kanghyun Choi
Seoul National University
H
Hyeyoon Lee
Seoul National University
D
Dain Kwon
Seoul National University
S
Sunjong Park
Seoul National University
K
Kyuyeun Kim
Google
Noseong Park
Noseong Park
Tenured Associate Professor, KAIST
Artificial Intelligence
Jinho Lee
Jinho Lee
Department of Electrical and Computer Engineering, Seoul National University
Computer architectureComputer systemsMachine learning