🤖 AI Summary
Deep learning models for Alzheimer’s disease (AD) diagnosis using 18F-FDG PET often rely on the North American ADNI cohort, yet their robustness across underrepresented populations—such as those in Latin America—remains unvalidated. Method: We trained convolutional neural networks (CNNs) and vision transformers (ViTs) on ADNI and conducted external validation on the Argentine FLENI clinical cohort. We performed ablation studies and occlusion analysis to identify key factors affecting generalization. Results: While models achieved high AUC (0.96–0.97) on ADNI, performance dropped substantially on FLENI (AUC 0.80–0.82), confirming severe domain shift. ViTs conferred no advantage over CNNs, and attention maps exhibited poor interpretability for non-AD cases and novel populations. Image normalization and sampling strategy were identified as critical determinants of cross-population generalizability. This is the first systematic evaluation of AD PET model generalization to Latin American cohorts, providing crucial insights for clinical deployment and model optimization.
📝 Abstract
Deep learning models have shown strong performance in diagnosing Alzheimer's disease (AD) using neuroimaging data, particularly 18F-FDG PET scans, with training datasets largely composed of North American cohorts such as those in the Alzheimer's Disease Neuroimaging Initiative (ADNI). However, their generalization to underrepresented populations remains underexplored. In this study, we benchmark convolutional and Transformer-based models on the ADNI dataset and assess their generalization performance on a novel Latin American clinical cohort from the FLENI Institute in Buenos Aires, Argentina. We show that while all models achieve high AUCs on ADNI (up to .96, .97), their performance drops substantially on FLENI (down to .82, .80, respectively), revealing a significant domain shift. The tested architectures demonstrated similar performance, calling into question the supposed advantages of transformers for this specific task. Through ablation studies, we identify per-image normalization and a correct sampling selection as key factors for generalization. Occlusion sensitivity analysis further reveals that models trained on ADNI, generally attend to canonical hypometabolic regions for the AD class, but focus becomes unclear for the other classes and for FLENI scans. These findings highlight the need for population-aware validation of diagnostic AI models and motivate future work on domain adaptation and cohort diversification.