RTGMFF: Enhanced fMRI-based Brain Disorder Diagnosis via ROI-driven Text Generation and Multimodal Feature Fusion

📅 2025-09-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

fMRI-based clinical diagnosis faces challenges including low signal-to-noise ratio, substantial inter-subject variability, insufficient modeling of frequency-domain information by existing CNN- or Transformer-based models, and absence of textual annotations describing regional brain activation and functional connectivity. To address these issues, we propose RTGMFF—a novel multimodal framework featuring: (i) the first ROI-level, brain-region-driven text generation mechanism to construct reproducible semantic representations; (ii) an adaptive semantic alignment module integrating a hybrid frequency-spatial encoder—comprising wavelet-enhanced Mamba for spectral modeling and a cross-scale Transformer for spatial contextualization—to bridge the modality gap; and (iii) a regularized cosine similarity loss for precise multimodal embedding alignment. Evaluated on ADHD-200 and ABIDE datasets, RTGMFF achieves statistically significant improvements over state-of-the-art methods in diagnostic accuracy, sensitivity, specificity, and AUC, effectively mitigating both missing-modality bias and inter-individual heterogeneity.

Technology Category

Application Category

📝 Abstract

Functional magnetic resonance imaging (fMRI) is a powerful tool for probing brain function, yet reliable clinical diagnosis is hampered by low signal-to-noise ratios, inter-subject variability, and the limited frequency awareness of prevailing CNN- and Transformer-based models. Moreover, most fMRI datasets lack textual annotations that could contextualize regional activation and connectivity patterns. We introduce RTGMFF, a framework that unifies automatic ROI-level text generation with multimodal feature fusion for brain-disorder diagnosis. RTGMFF consists of three components: (i) ROI-driven fMRI text generation deterministically condenses each subject's activation, connectivity, age, and sex into reproducible text tokens; (ii) Hybrid frequency-spatial encoder fuses a hierarchical wavelet-mamba branch with a cross-scale Transformer encoder to capture frequency-domain structure alongside long-range spatial dependencies; and (iii) Adaptive semantic alignment module embeds the ROI token sequence and visual features in a shared space, using a regularized cosine-similarity loss to narrow the modality gap. Extensive experiments on the ADHD-200 and ABIDE benchmarks show that RTGMFF surpasses current methods in diagnostic accuracy, achieving notable gains in sensitivity, specificity, and area under the ROC curve. Code is available at https://github.com/BeistMedAI/RTGMFF.

Problem

Research questions and friction points this paper is trying to address.

Improves fMRI-based brain disorder diagnostic accuracy

Addresses low signal-to-noise and inter-subject variability issues

Integrates ROI text generation with multimodal feature fusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

ROI-driven text generation for fMRI data

Hybrid frequency-spatial encoder with wavelet-mamba

Adaptive semantic alignment with regularized cosine loss

🔎 Similar Papers

No similar papers found.

Authors to Follow