Data-Augmented Quantization-Aware Knowledge Distillation

📅 2025-09-03

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

This work addresses the lack of principled guidance for data augmentation selection in low-bit quantization-aware training–based knowledge distillation (QAT-KD). We present the first systematic investigation into the intrinsic relationship between augmentation strategies and QAT-KD performance. To this end, we propose a training-free evaluation metric—Contextual Mutual Information (CMI)—which quantifies information consistency between augmented samples and the quantized teacher-student model pair, enabling automatic ranking and selection of optimal augmentations. The method requires no additional training and is agnostic to specific QAT or KD algorithms. Extensive experiments across diverse architectures (ResNet, MobileNet) and benchmarks (CIFAR-100, ImageNet) demonstrate that CMI-guided augmentation consistently improves QAT-KD accuracy by 1.2–2.8% in Top-1 accuracy, validating its effectiveness, generalizability, and plug-and-play applicability.

Technology Category

Application Category

📝 Abstract

Quantization-aware training (QAT) and Knowledge Distillation (KD) are combined to achieve competitive performance in creating low-bit deep learning models. Existing KD and QAT works focus on improving the accuracy of quantized models from the network output perspective by designing better KD loss functions or optimizing QAT's forward and backward propagation. However, limited attention has been given to understanding the impact of input transformations, such as data augmentation (DA). The relationship between quantization-aware KD and DA remains unexplored. In this paper, we address the question: how to select a good DA in quantization-aware KD, especially for the models with low precisions? We propose a novel metric which evaluates DAs according to their capacity to maximize the Contextual Mutual Information--the information not directly related to an image's label--while also ensuring the predictions for each class are close to the ground truth labels on average. The proposed method automatically ranks and selects DAs, requiring minimal training overhead, and it is compatible with any KD or QAT algorithm. Extensive evaluations demonstrate that selecting DA strategies using our metric significantly improves state-of-the-art QAT and KD works across various model architectures and datasets.

Problem

Research questions and friction points this paper is trying to address.

Selecting optimal data augmentation for quantization-aware knowledge distillation

Maximizing contextual mutual information in low-precision models

Automatically ranking augmentation strategies with minimal training overhead

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines quantization-aware training with knowledge distillation

Proposes metric for data augmentation selection

Automatically ranks augmentations by mutual information

🔎 Similar Papers

No similar papers found.