🤖 AI Summary
To address the challenges of fusing multimodal MRI (anatomical MRI, diffusion tensor imaging, functional MRI) with clinical scale data and achieving low diagnostic accuracy in real-world clinical settings, this paper proposes a Transformer-based Mixture-of-Experts (MoE) framework. The method introduces modality-specific expert networks and a learnable soft gating mechanism to enable dynamic, weighted cross-modal feature fusion. It is the first work to apply the MoE architecture to real clinical classification of neurological disorders (NDs), incorporating 3D patch embedding and a multi-source data alignment strategy. Evaluated on the validation set, the model achieves an accuracy of 82.47%, outperforming the best baseline by over 10 percentage points. This substantial improvement significantly enhances differential diagnosis capability for phenotypically overlapping disorders—particularly Alzheimer’s disease and Parkinson’s disease—demonstrating strong clinical applicability and robustness in complex multimodal neuroimaging analysis.
📝 Abstract
The integration of multi-modal Magnetic Resonance Imaging (MRI) and clinical data holds great promise for enhancing the diagnosis of neurological disorders (NDs) in real-world clinical settings. Deep Learning (DL) has recently emerged as a powerful tool for extracting meaningful patterns from medical data to aid in diagnosis. However, existing DL approaches struggle to effectively leverage multi-modal MRI and clinical data, leading to suboptimal performance. To address this challenge, we utilize a unique, proprietary multi-modal clinical dataset curated for ND research. Based on this dataset, we propose a novel transformer-based Mixture-of-Experts (MoE) framework for ND classification, leveraging multiple MRI modalities-anatomical (aMRI), Diffusion Tensor Imaging (DTI), and functional (fMRI)-alongside clinical assessments. Our framework employs transformer encoders to capture spatial relationships within volumetric MRI data while utilizing modality-specific experts for targeted feature extraction. A gating mechanism with adaptive fusion dynamically integrates expert outputs, ensuring optimal predictive performance. Comprehensive experiments and comparisons with multiple baselines demonstrate that our multi-modal approach significantly enhances diagnostic accuracy, particularly in distinguishing overlapping disease states. Our framework achieves a validation accuracy of 82.47%, outperforming baseline methods by over 10%, highlighting its potential to improve ND diagnosis by applying multi-modal learning to real-world clinical data.