🤖 AI Summary
This study addresses the high cost and limited scalability of task-based fMRI acquisition by proposing Rest2Visual, the first model to directly predict vision-evoked 3D brain activation maps (ve-fMRI) from resting-state fMRI (rs-fMRI). Methodologically, it employs a volumetric encoder-decoder architecture that integrates image embeddings and modulates multi-scale 3D features via adaptive normalization. To enable training, we construct the first large-scale paired rs-fMRI/ve-fMRI dataset. Rest2Visual preserves subject-specific neuroanatomical structure and generates high-fidelity, individualized functional responses. Quantitatively, predicted activations achieve strong representational similarity to ground-truth ve-fMRI across multiple neural encoding metrics. Furthermore, the generated activations successfully support downstream stimulus image reconstruction. This work establishes a novel paradigm for task-free, scalable functional brain modeling—bypassing the need for explicit sensory stimulation while enabling population-level inference and personalized functional mapping.
📝 Abstract
Understanding how spontaneous brain activity relates to stimulus-driven neural responses is a fundamental challenge in cognitive neuroscience. While task-based functional magnetic resonance imaging (fMRI) captures localized stimulus-evoked brain activation, its acquisition is costly, time-consuming, and difficult to scale across populations. In contrast, resting-state fMRI (rs-fMRI) is task-free and abundant, but lacks direct interpretability. We introduce Rest2Visual, a conditional generative model that predicts visually evoked fMRI (ve-fMRI) from resting-state input and 2D visual stimuli. It follows a volumetric encoder--decoder design, where multiscale 3D features from rs-fMRI are modulated by image embeddings via adaptive normalization, enabling spatially accurate, stimulus-specific activation synthesis. To enable model training, we construct a large-scale triplet dataset from the Natural Scenes Dataset (NSD), aligning each rs-fMRI volume with stimulus images and their corresponding ve-fMRI activation maps. Quantitative evaluation shows that the predicted activations closely match ground truth across standard similarity and representational metrics, and support successful image reconstruction in downstream decoding. Notably, the predicted maps preserve subject-specific structure, demonstrating the model's capacity to generate individualized functional surrogates. Our results provide compelling evidence that individualized spontaneous neural activity can be transformed into stimulus-aligned representations, opening new avenues for scalable, task-free functional brain modeling.