M3D-BFS: a Multi-stage Dynamic Fusion Strategy for Sample-Adaptive Multi-Modal Brain Network Analysis

📅 2026-04-02

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limitations of existing static fusion strategies in multimodal brain network analysis, which fail to adaptively adjust the fusion process according to input samples, thereby constraining model performance. To overcome this, the study introduces a dynamic fusion mechanism into multimodal brain network modeling for the first time, proposing a multi-stage dynamic fusion framework. This framework employs Mixture-of-Experts (MoE) modules tailored separately for unimodal and multimodal representations, enabling input-driven adaptive fusion during inference. Furthermore, a three-stage progressive training strategy combined with a multimodal disentanglement loss is incorporated to effectively mitigate expert collapse. Extensive experiments on multiple real-world brain network datasets demonstrate that the proposed method significantly outperforms current static fusion approaches, confirming its effectiveness and superiority.

📝 Abstract

Multi-modal fusion is of great significance in neuroscience which integrates information from different modalities and can achieve better performance than uni-modal methods in downstream tasks. Current multi-modal fusion methods in brain networks, which mainly focus on structural connectivity (SC) and functional connectivity (FC) modalities, are static in nature. They feed different samples into the same model with identical computation, ignoring inherent difference between input samples. This lack of sample adaptation hinders model's further performance. To this end, we innovatively propose a multi-stage dynamic fusion strategy (M3D-BFS) for sample-adaptive multi-modal brain network analysis. Unlike other static fusion methods, we design different mixture-of-experts (MoEs) for uni- and multi-modal representations where modules can adaptively change as input sample changes during inference. To alleviate issue of MoE where training of experts may be collapsed, we divide our method into 3 stages. We first train uni-modal encoders respectively, then pretrain single experts of MoEs before finally finetuning the whole model. A multi-modal disentanglement loss is designed to enhance the final representations. To the best of our knowledge, this is the first work for dynamic fusion for multi-modal brain network analysis. Extensive experiments on different real-world datasets demonstrates the superiority of M3D-BFS.

Problem

Research questions and friction points this paper is trying to address.

multi-modal fusion

sample adaptation

brain network analysis

dynamic fusion

structural and functional connectivity

Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic fusion

sample-adaptive

mixture-of-experts

multi-modal brain network

multi-stage training

🔎 Similar Papers

No similar papers found.

Authors to Follow