🤖 AI Summary
fMRI-based clinical diagnosis faces challenges including low signal-to-noise ratio, substantial inter-subject variability, insufficient modeling of frequency-domain information by existing CNN- or Transformer-based models, and absence of textual annotations describing regional brain activation and functional connectivity. To address these issues, we propose RTGMFF—a novel multimodal framework featuring: (i) the first ROI-level, brain-region-driven text generation mechanism to construct reproducible semantic representations; (ii) an adaptive semantic alignment module integrating a hybrid frequency-spatial encoder—comprising wavelet-enhanced Mamba for spectral modeling and a cross-scale Transformer for spatial contextualization—to bridge the modality gap; and (iii) a regularized cosine similarity loss for precise multimodal embedding alignment. Evaluated on ADHD-200 and ABIDE datasets, RTGMFF achieves statistically significant improvements over state-of-the-art methods in diagnostic accuracy, sensitivity, specificity, and AUC, effectively mitigating both missing-modality bias and inter-individual heterogeneity.
📝 Abstract
Functional magnetic resonance imaging (fMRI) is a powerful tool for probing brain function, yet reliable clinical diagnosis is hampered by low signal-to-noise ratios, inter-subject variability, and the limited frequency awareness of prevailing CNN- and Transformer-based models. Moreover, most fMRI datasets lack textual annotations that could contextualize regional activation and connectivity patterns. We introduce RTGMFF, a framework that unifies automatic ROI-level text generation with multimodal feature fusion for brain-disorder diagnosis. RTGMFF consists of three components: (i) ROI-driven fMRI text generation deterministically condenses each subject's activation, connectivity, age, and sex into reproducible text tokens; (ii) Hybrid frequency-spatial encoder fuses a hierarchical wavelet-mamba branch with a cross-scale Transformer encoder to capture frequency-domain structure alongside long-range spatial dependencies; and (iii) Adaptive semantic alignment module embeds the ROI token sequence and visual features in a shared space, using a regularized cosine-similarity loss to narrow the modality gap. Extensive experiments on the ADHD-200 and ABIDE benchmarks show that RTGMFF surpasses current methods in diagnostic accuracy, achieving notable gains in sensitivity, specificity, and area under the ROC curve. Code is available at https://github.com/BeistMedAI/RTGMFF.