🤖 AI Summary
High-quality annotations in medical image segmentation are costly to acquire, yet their practical benefits remain unclear. Method: We systematically evaluate how label quality affects model performance and pretraining efficacy by generating multi-tier pseudo-label CT datasets using nnU-Net, TotalSegmentator, and MedSAM; we conduct controlled experiments, cross-quality attribution analysis, and pretraining ablation studies. Contribution/Results: We identify, for the first time, a distinct performance threshold: in-domain segmentation accuracy improves significantly only when label quality exceeds a minimal threshold; conversely, pretraining efficacy remains largely invariant to label quality, indicating models rely more on general anatomical priors than fine-grained annotations. Consequently, low-quality pseudo-labels suffice for effective pretraining; manual refinement yields measurable gains only when downstream tasks demand high boundary precision *and* label quality crosses this critical threshold—providing empirical guidance for optimal annotation resource allocation.
📝 Abstract
Improving label quality in medical image segmentation is costly, but its benefits remain unclear. We systematically evaluate its impact using multiple pseudo-labeled versions of CT datasets, generated by models like nnU-Net, TotalSegmentator, and MedSAM. Our results show that while higher-quality labels improve in-domain performance, gains remain unclear if below a small threshold. For pre-training, label quality has minimal impact, suggesting that models rather transfer general concepts than detailed annotations. These findings provide guidance on when improving label quality is worth the effort.