🤖 AI Summary
Existing knowledge distillation methods—such as deep mutual learning and self-distillation—suffer from limited performance gains due to their neglect of dynamic inter-network learning direction coupling during training iterations. To address this, we propose a competitive distillation framework that establishes a multi-network collaborative training architecture. It introduces a dynamic teacher-student role-switching mechanism, wherein each network adaptively assumes the teacher or student role based on its real-time performance, coupled with stochastic perturbation to induce parameter mutations and facilitate global optimization. This approach breaks away from conventional unidirectional distillation paradigms, enhancing feature discriminability and model generalization. Extensive experiments on major visual classification benchmarks—including CIFAR-100 and ImageNet-1K—demonstrate consistent and significant improvements over state-of-the-art distillation methods, validating both effectiveness and cross-dataset generalizability.
📝 Abstract
Deep Neural Networks (DNNs) have significantly advanced the field of computer vision. To improve DNN training process, knowledge distillation methods demonstrate their effectiveness in accelerating network training by introducing a fixed learning direction from the teacher network to student networks. In this context, several distillation-based optimization strategies are proposed, e.g., deep mutual learning and self-distillation, as an attempt to achieve generic training performance enhancement through the cooperative training of multiple networks. However, such strategies achieve limited improvements due to the poor understanding of the impact of learning directions among networks across different iterations. In this paper, we propose a novel competitive distillation strategy that allows each network in a group to potentially act as a teacher based on its performance, enhancing the overall learning performance. Competitive distillation organizes a group of networks to perform a shared task and engage in competition, where competitive optimization is proposed to improve the parameter updating process. We further introduce stochastic perturbation in competitive distillation, aiming to motivate networks to induce mutations to achieve better visual representations and global optimum. The experimental results show that competitive distillation achieves promising performance in diverse tasks and datasets.