🤖 AI Summary
This study addresses the reliability of deep learning–based automatic segmentation in PSMA PET/CT imaging for quantitative analysis in prostate cancer patients with biochemical recurrence—going beyond conventional Dice coefficient evaluation to systematically assess six clinically critical metrics: SUV<sub>max</sub>, SUV<sub>mean</sub>, total lesion activity (TLA), total metabolic tumor volume (TMTV), lesion count, and lesion dispersion. We propose an L1-weighted Dice Focal Loss (L1DFL) and validate it on U-Net, Attention U-Net, and SegResNet. To rigorously evaluate clinical equivalence, we introduce for the first time in PSMA PET segmentation the Two One-Sided Tests (TOST) procedure (Δ = 20%) and multidimensional Bland–Altman bias analysis. Results show that Attention U-Net with L1DFL achieves high concordance with ground truth for SUV<sub>max</sub> and TLA (concordance correlation coefficient, CCC = 0.90–0.99), and all SUV-derived metrics, lesion count, and TLA pass the equivalence test. The implementation code is publicly available.
📝 Abstract
This study performs a comprehensive evaluation of quantitative measurements as extracted from automated deep-learning-based segmentation methods, beyond traditional Dice Similarity Coefficient assessments, focusing on six quantitative metrics, namely SUVmax, SUVmean, total lesion activity (TLA), tumor volume (TMTV), lesion count, and lesion spread. We analyzed 380 prostate-specific membrane antigen (PSMA) targeted [18F]DCFPyL PET/CT scans of patients with biochemical recurrence of prostate cancer, training deep neural networks, U-Net, Attention U-Net and SegResNet with four loss functions: Dice Loss, Dice Cross Entropy, Dice Focal Loss, and our proposed L1 weighted Dice Focal Loss (L1DFL). Evaluations indicated that Attention U-Net paired with L1DFL achieved the strongest correlation with the ground truth (concordance correlation = 0.90-0.99 for SUVmax and TLA), whereas models employing the Dice Loss and the other two compound losses, particularly with SegResNet, underperformed. Equivalence testing (TOST, alpha = 0.05, Delta = 20%) confirmed high performance for SUV metrics, lesion count and TLA, with L1DFL yielding the best performance. By contrast, tumor volume and lesion spread exhibited greater variability. Bland-Altman, Coverage Probability, and Total Deviation Index analyses further highlighted that our proposed L1DFL minimizes variability in quantification of the ground truth clinical measures. The code is publicly available at: https://github.com/ObedDzik/pca_segment.git.