RTGMFF: Enhanced fMRI-based Brain Disorder Diagnosis via ROI-driven Text Generation and Multimodal Feature Fusion

📅 2025-09-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
fMRI-based clinical diagnosis faces challenges including low signal-to-noise ratio, substantial inter-subject variability, insufficient modeling of frequency-domain information by existing CNN- or Transformer-based models, and absence of textual annotations describing regional brain activation and functional connectivity. To address these issues, we propose RTGMFF—a novel multimodal framework featuring: (i) the first ROI-level, brain-region-driven text generation mechanism to construct reproducible semantic representations; (ii) an adaptive semantic alignment module integrating a hybrid frequency-spatial encoder—comprising wavelet-enhanced Mamba for spectral modeling and a cross-scale Transformer for spatial contextualization—to bridge the modality gap; and (iii) a regularized cosine similarity loss for precise multimodal embedding alignment. Evaluated on ADHD-200 and ABIDE datasets, RTGMFF achieves statistically significant improvements over state-of-the-art methods in diagnostic accuracy, sensitivity, specificity, and AUC, effectively mitigating both missing-modality bias and inter-individual heterogeneity.

Technology Category

Application Category

📝 Abstract
Functional magnetic resonance imaging (fMRI) is a powerful tool for probing brain function, yet reliable clinical diagnosis is hampered by low signal-to-noise ratios, inter-subject variability, and the limited frequency awareness of prevailing CNN- and Transformer-based models. Moreover, most fMRI datasets lack textual annotations that could contextualize regional activation and connectivity patterns. We introduce RTGMFF, a framework that unifies automatic ROI-level text generation with multimodal feature fusion for brain-disorder diagnosis. RTGMFF consists of three components: (i) ROI-driven fMRI text generation deterministically condenses each subject's activation, connectivity, age, and sex into reproducible text tokens; (ii) Hybrid frequency-spatial encoder fuses a hierarchical wavelet-mamba branch with a cross-scale Transformer encoder to capture frequency-domain structure alongside long-range spatial dependencies; and (iii) Adaptive semantic alignment module embeds the ROI token sequence and visual features in a shared space, using a regularized cosine-similarity loss to narrow the modality gap. Extensive experiments on the ADHD-200 and ABIDE benchmarks show that RTGMFF surpasses current methods in diagnostic accuracy, achieving notable gains in sensitivity, specificity, and area under the ROC curve. Code is available at https://github.com/BeistMedAI/RTGMFF.
Problem

Research questions and friction points this paper is trying to address.

Improves fMRI-based brain disorder diagnostic accuracy
Addresses low signal-to-noise and inter-subject variability issues
Integrates ROI text generation with multimodal feature fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

ROI-driven text generation for fMRI data
Hybrid frequency-spatial encoder with wavelet-mamba
Adaptive semantic alignment with regularized cosine loss
🔎 Similar Papers
No similar papers found.
Junhao Jia
Junhao Jia
Hangzhou Dianzi University
Explainable AI (XAI)Interpretable Computer VisionMedical Image Analysis
Y
Yifei Sun
Zhejiang University, Hangzhou, China
Y
Yunyou Liu
Hangzhou Dianzi University, Hangzhou, China
C
Cheng Yang
Hangzhou Dianzi University, Hangzhou, China
C
Changmiao Wang
Shenzhen Research Institute of Big Data, Shenzhen, China
Feiwei Qin
Feiwei Qin
Prof. College of Computer Science, Hangzhou Dianzi University
Artificial IntelligenceComputer-Aided DesignComputer VisionMedical Image Analysis
Y
Yong Peng
Hangzhou Dianzi University, Hangzhou, China
W
Wenwen Min
Yunnan University, Kunming, China