Maximizing discrimination capability of knowledge distillation with energy function

📅 2023-11-24

🏛️ Knowledge-Based Systems

📈 Citations: 4

✨ Influential: 0

🤖 AI Summary

Existing knowledge distillation (KD) methods apply a uniform temperature scaling across all samples, limiting their ability to capture sample-level discriminative knowledge. To address this, we propose a novel energy-based distillation framework: for the first time, we incorporate energy modeling into the KD objective, explicitly optimizing the student model’s energy-based decision boundary. Our method introduces three key components: (i) a contrastive energy margin loss to enforce discriminative separation between classes; (ii) logits distribution calibration to align soft predictions; and (iii) teacher–student energy consistency constraints to preserve structural knowledge. These innovations overcome the fine-grained discriminative modeling limitations inherent in conventional KL-divergence-based distillation. Extensive experiments on CIFAR-100 and an ImageNet subset demonstrate consistent improvements—up to +1.8%–+2.3% in student Top-1 accuracy—alongside substantial gains in discriminative confidence and adversarial robustness.

Problem

Research questions and friction points this paper is trying to address.

Enhance knowledge distillation by classifying samples using energy scores

Optimize temperature scaling for low and high energy samples differently

Improve performance on challenging datasets with energy-based data augmentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Classify samples by energy score for distillation

Adjust temperature based on sample energy levels

Augment high energy samples to boost performance

🔎 Similar Papers

No similar papers found.

Authors to Follow