🤖 AI Summary
Existing knowledge distillation (KD) methods apply a uniform temperature scaling across all samples, limiting their ability to capture sample-level discriminative knowledge. To address this, we propose a novel energy-based distillation framework: for the first time, we incorporate energy modeling into the KD objective, explicitly optimizing the student model’s energy-based decision boundary. Our method introduces three key components: (i) a contrastive energy margin loss to enforce discriminative separation between classes; (ii) logits distribution calibration to align soft predictions; and (iii) teacher–student energy consistency constraints to preserve structural knowledge. These innovations overcome the fine-grained discriminative modeling limitations inherent in conventional KL-divergence-based distillation. Extensive experiments on CIFAR-100 and an ImageNet subset demonstrate consistent improvements—up to +1.8%–+2.3% in student Top-1 accuracy—alongside substantial gains in discriminative confidence and adversarial robustness.