Challenges in Deep Learning-Based Small Organ Segmentation: A Benchmarking Perspective for Medical Research with Limited Datasets

📅 2025-09-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the scarcity of annotated data for carotid artery structure segmentation in cardiovascular histopathological images. Method: We systematically evaluate the performance and stability of mainstream segmentation models—including U-Net, DeepLabV3+, SegFormer, SAM, MedSAM, and MedSAM+UNet—under few-shot learning conditions. Using Bayesian hyperparameter optimization and multiple randomized data splits, we quantify variability in model rankings across different train-validation partitions. Contribution/Results: We demonstrate that model rankings under low-data regimes are highly sensitive to data partitioning, with observed performance differences primarily attributable to statistical noise rather than intrinsic algorithmic superiority. This challenges the validity of conventional benchmarking practices in clinical low-resource settings, revealing that standard benchmark scores poorly reflect real-world clinical utility. Crucially, this work provides the first quantitative evidence of evaluation instability in few-shot medical image segmentation and advocates for a paradigm shift toward clinically grounded, robust evaluation frameworks tailored for deployment-ready model assessment.

Technology Category

Application Category

📝 Abstract
Accurate segmentation of carotid artery structures in histopathological images is vital for advancing cardiovascular disease research and diagnosis. However, deep learning model development in this domain is constrained by the scarcity of annotated cardiovascular histopathological data. This study investigates a systematic evaluation of state-of-the-art deep learning segmentation models, including convolutional neural networks (U-Net, DeepLabV3+), a Vision Transformer (SegFormer), and recent foundation models (SAM, MedSAM, MedSAM+UNet), on a limited dataset of cardiovascular histology images. Despite employing an extensive hyperparameter optimization strategy with Bayesian search, our findings reveal that model performance is highly sensitive to data splits, with minor differences driven more by statistical noise than by true algorithmic superiority. This instability exposes the limitations of standard benchmarking practices in low-data clinical settings and challenges the assumption that performance rankings reflect meaningful clinical utility.
Problem

Research questions and friction points this paper is trying to address.

Evaluating deep learning models for small organ segmentation with limited datasets
Assessing model sensitivity to data splits in low-data clinical settings
Challenging performance ranking assumptions for clinical utility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated multiple deep learning segmentation models
Employed Bayesian hyperparameter optimization strategy
Assessed model sensitivity to data splits
🔎 Similar Papers
No similar papers found.
P
Phongsakon Mark Konrad
Centre for Industrial Software, University of Southern Denmark, Alsion 2, Sønderborg, 6400, Denmark
A
Andrei-Alexandru Popa
Centre for Industrial Mechanics, University of Southern Denmark, Alsion 2, Sønderborg, 6400, Denmark
Y
Yaser Sabzehmeidani
Centre for Industrial Mechanics, University of Southern Denmark, Alsion 2, Sønderborg, 6400, Denmark
Liang Zhong
Liang Zhong
NHCS & Duke-NUS
Cardiovascular MechanicsNoninvasive Medical ImagingMulti-Physics ModelingCoronary Circulation
E
Elisa A. Liehn
National Heart Center Singapore, 5 Hospital Dr, Singapore, 169609, Singapore
S
Serkan Ayvaz
Centre for Industrial Software, University of Southern Denmark, Alsion 2, Sønderborg, 6400, Denmark