Challenges in Deep Learning-Based Small Organ Segmentation: A Benchmarking Perspective for Medical Research with Limited Datasets

📅 2025-09-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the scarcity of annotated data for carotid artery structure segmentation in cardiovascular histopathological images. Method: We systematically evaluate the performance and stability of mainstream segmentation models—including U-Net, DeepLabV3+, SegFormer, SAM, MedSAM, and MedSAM+UNet—under few-shot learning conditions. Using Bayesian hyperparameter optimization and multiple randomized data splits, we quantify variability in model rankings across different train-validation partitions. Contribution/Results: We demonstrate that model rankings under low-data regimes are highly sensitive to data partitioning, with observed performance differences primarily attributable to statistical noise rather than intrinsic algorithmic superiority. This challenges the validity of conventional benchmarking practices in clinical low-resource settings, revealing that standard benchmark scores poorly reflect real-world clinical utility. Crucially, this work provides the first quantitative evidence of evaluation instability in few-shot medical image segmentation and advocates for a paradigm shift toward clinically grounded, robust evaluation frameworks tailored for deployment-ready model assessment.

Technology Category

Application Category

📝 Abstract

Accurate segmentation of carotid artery structures in histopathological images is vital for advancing cardiovascular disease research and diagnosis. However, deep learning model development in this domain is constrained by the scarcity of annotated cardiovascular histopathological data. This study investigates a systematic evaluation of state-of-the-art deep learning segmentation models, including convolutional neural networks (U-Net, DeepLabV3+), a Vision Transformer (SegFormer), and recent foundation models (SAM, MedSAM, MedSAM+UNet), on a limited dataset of cardiovascular histology images. Despite employing an extensive hyperparameter optimization strategy with Bayesian search, our findings reveal that model performance is highly sensitive to data splits, with minor differences driven more by statistical noise than by true algorithmic superiority. This instability exposes the limitations of standard benchmarking practices in low-data clinical settings and challenges the assumption that performance rankings reflect meaningful clinical utility.

Problem

Research questions and friction points this paper is trying to address.

Evaluating deep learning models for small organ segmentation with limited datasets

Assessing model sensitivity to data splits in low-data clinical settings

Challenging performance ranking assumptions for clinical utility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated multiple deep learning segmentation models

Employed Bayesian hyperparameter optimization strategy

Assessed model sensitivity to data splits

🔎 Similar Papers

No similar papers found.

Authors to Follow