π€ AI Summary
This study addresses the limitation of existing methods that rely solely on cross-sectional structural MRI to predict conversion from mild cognitive impairment (MCI) to Alzheimerβs disease (AD), thereby overlooking individual longitudinal anatomical trajectories. To overcome this, the authors propose the Temporal Adaptive Fusion Network (TAF-Net), a novel hybrid CNN-Transformer architecture incorporating a first-of-its-kind adaptive temporal gating mechanism. TAF-Net effectively integrates paired longitudinal 3D MRI scans to jointly model structural changes, inter-regional temporal cross-attention, and bilateral spatiotemporal features. Evaluated on the ADNI cohort for three-year MCI-to-AD prediction using only structural MRI, TAF-Net outperforms the strongest single-modality baselines and approaches the performance of multimodal methods that leverage PET, CSF, or genetic data. Moreover, it demonstrates superior data efficiency, maintaining stable performance with reduced training data and achieving a 48% reduction in prediction variance.
π Abstract
Predicting conversion from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD) is critical for early intervention. Current deep learning paradigms predominantly rely on cross-sectional structural MRI, neglecting prognostic value in patient-specific anatomical trajectories. We introduce the Temporal Adaptive Fusion Network (TAF-Net), a hybrid CNN-Transformer architecture that models paired longitudinal 3D MRI scans. Central to TAF-Net is a Temporal Fusion Module governed by an Adaptive Temporal Gate, which learns patient-specific weightings to synthesize three spatiotemporal representations: explicit structural change, region-to-region temporal cross-attention, and bilateral feature concatenation. Evaluated on the Alzheimer's Disease Neuroimaging Initiative cohort for three-year MCI-to-AD conversion prediction, TAF-Net achieved the highest discriminative performance among all evaluated methods using only structural MRI, significantly outperforming the strongest baseline and approaching multimodal methods requiring PET, CSF, or genetic data. The architecture exhibited exceptional data efficiency, matching baseline performance with a fraction of training data. Ablation studies demonstrate that longitudinal fusion improves discrimination while reducing predictive variance by 48% compared to single-timepoint evaluation. Interpretability analyses reveal spatial attention aligned with established AD pathology in the medial temporal lobe and ventricles, while the gating mechanism prioritizes explicit volumetric change with strong positive correlation to conversion risk.