Exploiting DINOv3-Based Self-Supervised Features for Robust Few-Shot Medical Image Segmentation

📅 2026-01-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of scarce annotated data in few-shot medical image segmentation by proposing a novel framework leveraging self-supervised DINOv2 features. To mitigate the domain gap between natural-image pretraining and medical imaging, the method introduces two key components: WT-Aug, a wavelet-based feature augmentation module, and CG-Fuse, a context-guided fusion module. By integrating wavelet-domain feature enhancement with cross-attention mechanisms, the approach enables effective multi-scale contextual fusion at the feature level. Extensive experiments on six public datasets spanning five imaging modalities demonstrate that the proposed method significantly outperforms existing few-shot segmentation approaches, underscoring its robustness and generalization capability across diverse medical imaging domains.

Technology Category

Application Category

📝 Abstract
Deep learning-based automatic medical image segmentation plays a critical role in clinical diagnosis and treatment planning but remains challenging in few-shot scenarios due to the scarcity of annotated training data. Recently, self-supervised foundation models such as DINOv3, which were trained on large natural image datasets, have shown strong potential for dense feature extraction that can help with the few-shot learning challenge. Yet, their direct application to medical images is hindered by domain differences. In this work, we propose DINO-AugSeg, a novel framework that leverages DINOv3 features to address the few-shot medical image segmentation challenge. Specifically, we introduce WT-Aug, a wavelet-based feature-level augmentation module that enriches the diversity of DINOv3-extracted features by perturbing frequency components, and CG-Fuse, a contextual information-guided fusion module that exploits cross-attention to integrate semantic-rich low-resolution features with spatially detailed high-resolution features. Extensive experiments on six public benchmarks spanning five imaging modalities, including MRI, CT, ultrasound, endoscopy, and dermoscopy, demonstrate that DINO-AugSeg consistently outperforms existing methods under limited-sample conditions. The results highlight the effectiveness of incorporating wavelet-domain augmentation and contextual fusion for robust feature representation, suggesting DINO-AugSeg as a promising direction for advancing few-shot medical image segmentation. Code and data will be made available on https://github.com/apple1986/DINO-AugSeg.
Problem

Research questions and friction points this paper is trying to address.

few-shot learning
medical image segmentation
data scarcity
domain shift
self-supervised features
Innovation

Methods, ideas, or system contributions that make the work stand out.

DINOv3
few-shot segmentation
wavelet augmentation
contextual fusion
self-supervised features
🔎 Similar Papers
No similar papers found.
Guoping Xu
Guoping Xu
UTSW, WIT
Medical Image SegmentationDisease QuantificationComputer Vision
J
Jayaram K. Udupa
Medical Image Processing Group (MIPG), Department of Radiology, University of Pennsylvania, Philadelphia, PA 19104, USA
W
Weiguo Lu
The Medical Artificial Intelligence and Automation (MAIA) Laboratory, Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
Y
You Zhang
The Medical Artificial Intelligence and Automation (MAIA) Laboratory, Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA