π€ AI Summary
In knowledge distillation (KD), a fixed temperature parameter leads to suboptimal training, while architectural disparities between teacher and student models cause logit-scale mismatch. To address this, we propose an adaptive dynamic temperature scheduling method that adjusts the temperature in real time based on the cross-entropy divergence between teacher and student output distributions. During early training, a higher temperature preserves soft-label smoothness to facilitate effective knowledge transfer; later, the temperature is progressively lowered to sharpen student predictions. This mechanism requires no architectural modifications and is plug-and-play compatible with mainstream KD frameworks. Extensive experiments across multiple vision and NLP benchmarks demonstrate consistent and significant improvements over static-temperature baselines, validating the methodβs effectiveness, generalizability, and practical utility.
π Abstract
Knowledge Distillation (KD) trains a smaller student model using a large, pre-trained teacher model, with temperature as a key hyperparameter controlling the softness of output probabilities. Traditional methods use a fixed temperature throughout training, which is suboptimal. Moreover, architectural differences between teacher and student often result in mismatched logit magnitudes. We demonstrate that students benefit from softer probabilities early in training but require sharper probabilities in later stages. We introduce Dynamic Temperature Scheduler (DTS), which adjusts temperature dynamically based on the cross-entropy loss gap between teacher and student. To our knowledge, this is the first temperature scheduling method that adapts based on the divergence between teacher and student distributions. Our method integrates seamlessly with existing KD frameworks. We validate DTS across multiple KD strategies on vision (CIFAR-100, Tiny-ImageNet) and NLP tasks (GLUE, Dolly, SelfIns, UnNI, S-NI), consistently outperforming static-temperature baselines. Code is available at https://github.com/Sibgat-Ul/DTS.