Progressive Class-level Distillation

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In conventional knowledge distillation, high-confidence classes dominate logit alignment, causing low-probability yet highly discriminative classes to be overlooked—thereby limiting the completeness of knowledge transfer. To address this, we propose a stage-wise, priority-driven class-level distillation framework. Our method introduces a novel bidirectional staged distillation mechanism that jointly optimizes fine-to-coarse progressive learning and coarse-to-fine reverse refinement. Furthermore, we design a dynamic priority identification scheme based on logit-difference ranking, coupled with multi-stage grouped distillation, enabling precise and sufficient logit alignment across class groups of varying difficulty. Extensive experiments on multiple classification and object detection benchmarks demonstrate substantial improvements over state-of-the-art methods, particularly in recognizing long-tailed and low-confidence classes.

Technology Category

Application Category

📝 Abstract
In knowledge distillation (KD), logit distillation (LD) aims to transfer class-level knowledge from a more powerful teacher network to a small student model via accurate teacher-student alignment at the logits level. Since high-confidence object classes usually dominate the distillation process, low-probability classes which also contain discriminating information are downplayed in conventional methods, leading to insufficient knowledge transfer. To address this issue, we propose a simple yet effective LD method termed Progressive Class-level Distillation (PCD). In contrast to existing methods which perform all-class ensemble distillation, our PCD approach performs stage-wise distillation for step-by-step knowledge transfer. More specifically, we perform ranking on teacher-student logits difference for identifying distillation priority from scratch, and subsequently divide the entire LD process into multiple stages. Next, bidirectional stage-wise distillation incorporating fine-to-coarse progressive learning and reverse coarse-to-fine refinement is conducted, allowing comprehensive knowledge transfer via sufficient logits alignment within separate class groups in different distillation stages. Extension experiments on public benchmarking datasets demonstrate the superiority of our method compared to state-of-the-arts for both classification and detection tasks.
Problem

Research questions and friction points this paper is trying to address.

Addresses insufficient knowledge transfer in logit distillation due to dominant high-confidence classes
Proposes stage-wise distillation for step-by-step knowledge transfer via class ranking
Enhances logits alignment via bidirectional progressive learning and refinement stages
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive Class-level Distillation for stepwise learning
Bidirectional stage-wise distillation for comprehensive transfer
Ranking logits difference to prioritize distillation stages
🔎 Similar Papers
No similar papers found.
J
Jiayan Li
School of Computer and Electronic Information, Nanjing Normal University, Nanjing, China, 210023
J
Jun Li
School of Computer and Electronic Information, Nanjing Normal University, Nanjing, China, 210023
Z
Zhourui Zhang
School of Computer and Electronic Information, Nanjing Normal University, Nanjing, China, 210023
Jianhua Xu
Jianhua Xu
University of Electronic Science and Technology of China
Multi-Agent、Evolutionary Games、LLM-Agents