Task-Specific Knowledge Distillation from the Vision Foundation Model for Enhanced Medical Image Segmentation

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of annotated data in medical image segmentation, this paper proposes a task-specific knowledge distillation framework. First, a vision foundation model (VFM) is lightly fine-tuned using Low-Rank Adaptation (LoRA) in a task-oriented manner; subsequently, the adapted knowledge is distilled into a compact U-Net student model. Concurrently, a diffusion model generates high-fidelity synthetic data to augment the distillation training set. This work is the first to integrate task-specific VFM fine-tuning into the knowledge distillation pipeline, enabling precise and efficient transfer of domain-specific knowledge. Evaluated on five medical segmentation benchmarks, the method significantly outperforms task-agnostic distillation and state-of-the-art self-supervised approaches—including MoCo v3 and MAE—under extreme low-data regimes: +28% Dice on KidneyUS (80 labeled samples) and +11% Dice over MAE on CHAOS (100 labeled samples).

Technology Category

Application Category

📝 Abstract
Large-scale pre-trained models, such as Vision Foundation Models (VFMs), have demonstrated impressive performance across various downstream tasks by transferring generalized knowledge, especially when target data is limited. However, their high computational cost and the domain gap between natural and medical images limit their practical application in medical segmentation tasks. Motivated by this, we pose the following important question:"How can we effectively utilize the knowledge of large pre-trained VFMs to train a small, task-specific model for medical image segmentation when training data is limited?"To address this problem, we propose a novel and generalizable task-specific knowledge distillation framework. Our method fine-tunes the VFM on the target segmentation task to capture task-specific features before distilling the knowledge to smaller models, leveraging Low-Rank Adaptation (LoRA) to reduce the computational cost of fine-tuning. Additionally, we incorporate synthetic data generated by diffusion models to augment the transfer set, enhancing model performance in data-limited scenarios. Experimental results across five medical image datasets demonstrate that our method consistently outperforms task-agnostic knowledge distillation and self-supervised pretraining approaches like MoCo v3 and Masked Autoencoders (MAE). For example, on the KidneyUS dataset, our method achieved a 28% higher Dice score than task-agnostic KD using 80 labeled samples for fine-tuning. On the CHAOS dataset, it achieved an 11% improvement over MAE with 100 labeled samples. These results underscore the potential of task-specific knowledge distillation to train accurate, efficient models for medical image segmentation in data-constrained settings.
Problem

Research questions and friction points this paper is trying to address.

Utilize large pre-trained Vision Foundation Models for medical image segmentation.
Reduce computational cost and domain gap in medical image tasks.
Enhance model performance with limited training data using knowledge distillation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Task-specific knowledge distillation from Vision Foundation Models
Low-Rank Adaptation reduces fine-tuning computational cost
Synthetic data augmentation enhances performance in data-limited scenarios
🔎 Similar Papers
No similar papers found.
P
Pengchen Liang
Department Shanghai Key Laboratory of Gastric Neoplasms, Department of Surgery, Shanghai Institute of Digestive Surgery, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China; School of Microelectronics, Shanghai University, Shanghai, 201800, China
H
Haishan Huang
School of Software Engineering, Sun Yat-sen University, Zhuhai, 519000, China
Bin Pu
Bin Pu
The Hong Kong University of Science and Technology | HNU | NTU
Computer visionMedical image analysisUltrasound image processingAI4Science
J
Jianguo Chen
School of Software Engineering, Sun Yat-sen University, Zhuhai, 519000, China
X
Xiang Hua
School of Software Engineering, Sun Yat-sen University, Zhuhai, 519000, China
J
Jing Zhang
Department of Radiology, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, 200025, China
W
Weibo Ma
School of Public Administration, East China Normal University, Shanghai, 200062, China
Z
Zhuangzhuang Chen
Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, SAR, China
Y
Yiwei Li
Department of Nuclear Medicine, Shanghai Children’s Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200062, China
Qing Chang
Qing Chang
Mechanical and Aerospace Engineering, University of Virginia
smart manufacturing system modeling and analysisproduction controlHRC