🤖 AI Summary
Detecting multi-scale lymphoma lesions in whole-body PET/CT remains challenging due to anatomical variability and lesion heterogeneity.
Method: We propose a dual-modality detection framework integrating anatomical structural priors—specifically, organ segmentation masks generated by TotalSegmentator—into both nnDetection (a CNN-based detector) and Swin Transformer. We adopt a two-stage training strategy: self-supervised pretraining followed by supervised fine-tuning. This work is the first to systematically evaluate the impact of such anatomical priors on lesion detection performance.
Contribution/Results: Anatomical priors significantly improve nnDetection’s performance (mAP increases by 4.2–6.8% on AutoPET and Karolinska datasets), confirming their critical role in CNN-based modeling. In contrast, Swin Transformer shows no comparable gain, suggesting that convolutional architectures are better suited to leverage anatomical context. Our framework establishes an interpretable, reusable paradigm for anatomical prior integration in multimodal medical image detection.
📝 Abstract
Early cancer detection is crucial for improving patient outcomes, and 18F FDG PET/CT imaging plays a vital role by combining metabolic and anatomical information. Accurate lesion detection remains challenging due to the need to identify multiple lesions of varying sizes. In this study, we investigate the effect of adding anatomy prior information to deep learning-based lesion detection models. In particular, we add organ segmentation masks from the TotalSegmentator tool as auxiliary inputs to provide anatomical context to nnDetection, which is the state-of-the-art for lesion detection, and Swin Transformer. The latter is trained in two stages that combine self-supervised pre-training and supervised fine-tuning. The method is tested in the AutoPET and Karolinska lymphoma datasets. The results indicate that the inclusion of anatomical priors substantially improves the detection performance within the nnDetection framework, while it has almost no impact on the performance of the vision transformer. Moreover, we observe that Swin Transformer does not offer clear advantages over conventional convolutional neural network (CNN) encoders used in nnDetection. These findings highlight the critical role of the anatomical context in cancer lesion detection, especially in CNN-based models.