Coarse-to-fine Hierarchical Architecture with Sequential Mamba for Brain Reconstruction

📅 2026-06-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

222K/year
🤖 AI Summary
This work aims to establish a correspondence between deep visual models and the hierarchical organization of the human visual cortex to enable high-fidelity reconstruction of fMRI brain activity from images. To this end, we propose CHASMBrain, the first framework to introduce Mamba into brain decoding, featuring a dual-stream Mamba architecture that separately models global semantics and local spatial information. Integrating a Mamba-based variational autoencoder (Mamba-VAE) with a two-stage coarse-to-fine strategy, our approach first predicts region-of-interest activation and subsequently refines predictions to the voxel level. Through causal branch ablation and cross-subject transfer learning, we uncover a causal mapping between the dual-stream design and functional subdivisions of the visual cortex while learning shared visual representations. Evaluated on the Natural Scenes Dataset (NSD), our method achieves a Pearson correlation coefficient of 0.429 and a mean squared error of 0.261, significantly outperforming existing approaches.
📝 Abstract
Understanding the relationship between deep visual representations and the human visual system is a fundamental challenge in computational neuroscience. While modern vision models achieve strong performance in image recognition, their correspondence with the hierarchical organization of the human visual cortex remains an open question. In this study, we propose CHASMBrain, a novel hierarchical two-stage framework for image-to-fMRI encoding. Our architecture leverages a dual-stream Mamba design to explicitly separate and process global semantic tokens and local spatial patches, motivated by the functional organization of the visual cortex. A coarse-to-fine strategy is employed: Stage 1 predicts denoised ROI-level activations, while Stage 2 refines these coarse responses into full voxel-level predictions using a Mamba-VAE. Experiments on the Natural Scenes Dataset (NSD) demonstrate that our method achieves a Pearson correlation of 0.429 and an MSE of 0.261, outperforming all evaluated baselines including ridge regression and DINOv2 linear probes. Beyond predictive performance, causal branch-ablation experiments reveal an asymmetric specialization: the patch stream is specifically locked to early visual cortex (retinotopic regions), while the CLS stream contributes broader semantic context to higher-order areas -- a correspondence that holds causally, not merely correlationally. Cross-subject transfer experiments further show that the learned backbone generalizes across individuals with minimal per-subject adaptation, suggesting the model captures a shared, subject-agnostic visual representation.
Problem

Research questions and friction points this paper is trying to address.

brain reconstruction
visual cortex
fMRI encoding
hierarchical representation
computational neuroscience
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mamba architecture
coarse-to-fine hierarchy
dual-stream modeling
causal ablation
cross-subject generalization
🔎 Similar Papers
No similar papers found.
Hoang-Son Vo
Hoang-Son Vo
AI Convergence, Chonnam National University
Computer VisionMedical Image ProcessingImage Generation3D Image
V
Van-Hung Bui
Chonnam National University, Gwangju, Republic of Korea
M
Minh-Huy Mai-Duc
Vietnam National University - Ho Chi Minh City, University of Science, Vietnam
Tien-Dung Mai
Tien-Dung Mai
University of Information Technology - VNUHCM
S
Soo-Hyung Kim
Chonnam National University, Gwangju, Republic of Korea