🤖 AI Summary
This study addresses the challenges of organ segmentation in dynamic swallowing 4D-CT imaging and the absence of task-specific benchmarks. Methodologically, we present the first AI-driven organ segmentation system tailored for swallowing analysis: (1) we adapt nnU-Net—originally designed for static 3D medical image segmentation—to handle temporal 4D-CT swallowing sequences; (2) we introduce a leave-one-subject-out cross-validation strategy to enhance model generalizability; and (3) we establish the first dedicated 4D-CT swallowing segmentation benchmark, annotated with critical anatomical structures including the bolus, tongue, soft palate, and skeletal elements. Experimental results demonstrate median Dice scores ≥0.7 across key structures, enabling dynamic visualization and quantitative motion analysis of laryngeal anatomy—validating clinical utility. However, segmentation accuracy remains suboptimal for rapidly moving structures such as the thyroid cartilage and epiglottis, indicating a direction for future improvement.
📝 Abstract
This study presents the first report on the development of an artificial intelligence (AI) for automatic region segmentation of four-dimensional computer tomography (4D-CT) images during swallowing. The material consists of 4D-CT images taken during swallowing. Additionally, data for verifying the practicality of the AI were obtained from 4D-CT images during mastication and swallowing. The ground truth data for the region segmentation for the AI were created from five 4D-CT datasets of swallowing. A 3D convolutional model of nnU-Net was used for the AI. The learning and evaluation method for the AI was leave-one-out cross-validation. The number of epochs for training the nnU-Net was 100. The Dice coefficient was used as a metric to assess the AI's region segmentation accuracy. Regions with a median Dice coefficient of 0.7 or higher included the bolus, bones, tongue, and soft palate. Regions with a Dice coefficient below 0.7 included the thyroid cartilage and epiglottis. Factors that reduced the Dice coefficient included metal artifacts caused by dental crowns in the bolus and the speed of movement for the thyroid cartilage and epiglottis. In practical verification of the AI, no significant misrecognition was observed for facial bones, jaw bones, or the tongue. However, regions such as the hyoid bone, thyroid cartilage, and epiglottis were not fully delineated during fast movement. It is expected that future research will improve the accuracy of the AI's region segmentation, though the risk of misrecognition will always exist. Therefore, the development of tools for efficiently correcting the AI's segmentation results is necessary. AI-based visualization is expected to contribute not only to the deepening of motion analysis of organs during swallowing but also to improving the accuracy of swallowing CT by clearly showing the current state of its precision.