๐ค AI Summary
This work addresses the challenge of domain shift in medical vascular segmentation, where existing methods suffer significant performance degradation under new imaging devices or protocols due to reliance on large annotated datasetsโscarce in clinical practice. To overcome this, the study presents the first effective adaptation of the 2D vision foundation model DINOv3 to 3D vascular segmentation, introducing a lightweight 3D adapter, a multi-scale 3D feature aggregator, and Z-axis channel embedding to preserve vascular continuity and enhance cross-domain robustness. With only five training samples, the method achieves a Dice score of 43.42%, outperforming nnU-Net by 30%; on out-of-domain data, it attains a Dice of 21.37%, representing a 50% relative improvement and substantially surpassing baselines such as SwinUNETR.
๐ Abstract
State-of-the-art vessel segmentation methods typically require large-scale annotated datasets and suffer from severe performance degradation under domain shifts. In clinical practice, however, acquiring extensive annotations for every new scanner or protocol is unfeasible. To address this, we propose a novel framework leveraging a pre-trained Vision Foundation Model (DINOv3) adapted for volumetric vessel segmentation. We introduce a lightweight 3D Adapter for volumetric consistency, a multi-scale 3D Aggregator for hierarchical feature fusion, and Z-channel embedding to effectively bridge the gap between 2D pre-training and 3D medical modalities, enabling the model to capture continuous vascular structures from limited data. We validated our method on the TopCoW (in-domain) and Lausanne (out-of-distribution) datasets. In the extreme few-shot regime with 5 training samples, our method achieved a Dice score of 43.42%, marking a 30% relative improvement over the state-of-the-art nnU-Net (33.41%) and outperforming other Transformer-based baselines, such as SwinUNETR and UNETR, by up to 45%. Furthermore, in the out-of-distribution setting, our model demonstrated superior robustness, achieving a 50% relative improvement over nnU-Net (21.37% vs. 14.22%), which suffered from severe domain overfitting. Ablation studies confirmed that our 3D adaptation mechanism and multi-scale aggregation strategy are critical for vascular continuity and robustness. Our results suggest foundation models offer a viable cold-start solution, improving clinical reliability under data scarcity or domain shifts.