🤖 AI Summary
Ultrasound-based bone surface segmentation is hindered by low signal-to-noise ratio, acoustic shadowing, and difficulty annotating anechoic regions—leading to high manual annotation costs, incomplete coverage, and consequently limited model generalizability and benchmark development. To address this, we propose a CT-guided fully automated label generation framework, integrating multimodal image registration and ultrasound physics modeling, validated by clinical experts. This yields UltraBones100k—the first large-scale, high-fidelity bone segmentation dataset comprising 100,000 lower-limb ultrasound images, with complete coverage of previously challenging regions. Wilcoxon signed-rank tests confirm statistically significant superiority over manual annotations (p < 0.001); our labels improve segmentation completeness by 320% at the 0.5-mm threshold and outperform expert annotations across F1-score and accuracy. This work overcomes the annotation bottleneck, establishing the first large-scale, high-precision benchmark and a reproducible technical paradigm for ultrasound bone segmentation.
📝 Abstract
Ultrasound-based bone surface segmentation is crucial in computer-assisted orthopedic surgery. However, ultrasound images have limitations, including a low signal-to-noise ratio, and acoustic shadowing, which make interpretation difficult. Existing deep learning models for bone segmentation rely primarily on costly manual labeling by experts, limiting dataset size and model generalizability. Additionally, the complexity of ultrasound physics and acoustic shadow makes the images difficult for humans to interpret, leading to incomplete labels in anechoic regions and limiting model performance. To advance ultrasound bone segmentation and establish effective model benchmarks, larger and higher-quality datasets are needed. We propose a methodology for collecting ex-vivo ultrasound datasets with automatically generated bone labels, including anechoic regions. The proposed labels are derived by accurately superimposing tracked bone CT models onto the tracked ultrasound images. These initial labels are refined to account for ultrasound physics. A clinical evaluation is conducted by an expert physician specialized on orthopedic sonography to assess the quality of the generated bone labels. A neural network for bone segmentation is trained on the collected dataset and its predictions are compared to expert manual labels, evaluating accuracy, completeness, and F1-score. We collected the largest known dataset of 100k ultrasound images of human lower limbs with bone labels, called UltraBones100k. A Wilcoxon signed-rank test with Bonferroni correction confirmed that the bone alignment after our method significantly improved the quality of bone labeling (p<0.001). The model trained on UltraBones100k consistently outperforms manual labeling in all metrics, particularly in low-intensity regions (320% improvement in completeness at a distance threshold of 0.5 mm).