Powerful Teachers Matter: Text-Guided Multi-view Knowledge Distillation with Visual Prior Enhancement

πŸ“… 2026-03-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes Text-guided Multi-view Knowledge Distillation (TMKD), a novel approach that addresses the limitation of existing knowledge distillation methods which often overlook the quality of teacher-provided knowledge. TMKD introduces a dual-modality teacher collaboration mechanism leveraging both visual and CLIP-derived textual modalities. It enhances visual priors through multi-view augmentation and edge/high-frequency feature extraction, while semantic weights generated from text prompts enable adaptive feature fusion. Furthermore, a vision-language contrastive regularization is designed to strengthen the student model’s semantic comprehension. Evaluated across five benchmark datasets, TMKD achieves an average performance gain of 4.49% in distillation accuracy, significantly outperforming current state-of-the-art methods.

Technology Category

Application Category

πŸ“ Abstract
Knowledge distillation transfers knowledge from large teacher models to smaller students for efficient inference. While existing methods primarily focus on distillation strategies, they often overlook the importance of enhancing teacher knowledge quality. In this paper, we propose Text-guided Multi-view Knowledge Distillation (TMKD), which leverages dual-modality teachers, a visual teacher and a text teacher (CLIP), to provide richer supervisory signals. Specifically, we enhance the visual teacher with multi-view inputs incorporating visual priors (edge and high-frequency features), while the text teacher generates semantic weights through prior-aware prompts to guide adaptive feature fusion. Additionally, we introduce vision-language contrastive regularization to strengthen semantic knowledge in the student model. Extensive experiments on five benchmarks demonstrate that TMKD consistently improves knowledge distillation performance by up to 4.49\%, validating the effectiveness of our dual-teacher multi-view enhancement strategy. Code is available at https://anonymous.4open.science/r/TMKD-main-44D1.
Problem

Research questions and friction points this paper is trying to address.

knowledge distillation
teacher model
visual prior
semantic knowledge
multi-view
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-view knowledge distillation
visual prior enhancement
text-guided fusion
vision-language contrastive learning
dual-modality teachers
πŸ”Ž Similar Papers
No similar papers found.
Xin Zhang
Xin Zhang
Hangzhou Dianzi University
image processingmachine learning
J
Jianyang Xu
Hangzhou Dianzi University
H
Hao Peng
Hangzhou Dianzi University
D
Dongjing Wang
Hangzhou Dianzi University
J
Jingyuan Zheng
Hangzhou Dianzi University
Y
Yu Li
Hangzhou Dianzi University
Yuyu Yin
Yuyu Yin
Hangzhou Dianzi University
H
Hongbo Wang
Hangzhou Dianzi University