From Eye to Mind: brain2text Decoding Reveals the Neural Mechanisms of Visual Semantic Processing

📅 2025-03-15
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses two key limitations in fMRI-based brain decoding: overreliance on visual reconstruction and insufficient neuroscientific interpretation of semantic processing. We propose the first end-to-end fMRI-to-text semantic decoding paradigm that bypasses intermediate visual reconstruction entirely. Methodologically, we integrate cross-modal semantic alignment training, deep learning–based fMRI signal modeling, and fine-grained neuroanatomical localization analysis. Key contributions include: (1) the first systematic demonstration of cooperative involvement of MT+, ventral visual cortex, and inferior parietal lobule in vision-to-semantic transformation; (2) empirical validation of separable neural representations for high-level semantic dimensions—including animacy and motion; and (3) state-of-the-art (SOTA) semantic decoding performance, with generated text accurately capturing scene-level semantics. Our framework establishes a high-fidelity, interpretable observational pathway for investigating cortical semantic encoding mechanisms.

Technology Category

Application Category

📝 Abstract
Deciphering the neural mechanisms that transform sensory experiences into meaningful semantic representations is a fundamental challenge in cognitive neuroscience. While neuroimaging has mapped a distributed semantic network, the format and neural code of semantic content remain elusive, particularly for complex, naturalistic stimuli. Traditional brain decoding, focused on visual reconstruction, primarily captures low-level perceptual features, missing the deeper semantic essence guiding human cognition. Here, we introduce a paradigm shift by directly decoding fMRI signals into textual descriptions of viewed natural images. Our novel deep learning model, trained without visual input, achieves state-of-the-art semantic decoding performance, generating meaningful captions that capture the core semantic content of complex scenes. Neuroanatomical analysis reveals the critical role of higher-level visual regions, including MT+, ventral stream visual cortex, and inferior parietal cortex, in this semantic transformation. Category-specific decoding further demonstrates nuanced neural representations for semantic dimensions like animacy and motion. This text-based decoding approach provides a more direct and interpretable window into the brain's semantic encoding than visual reconstruction, offering a powerful new methodology for probing the neural basis of complex semantic processing, refining our understanding of the distributed semantic network, and potentially inspiring brain-inspired language models.
Problem

Research questions and friction points this paper is trying to address.

Decoding neural activity into visual semantic content
Integrating brain decoding with neuroscientific theories systematically
Exploring neural mechanisms underlying visual semantic processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decodes fMRI signals into textual descriptions
Deep learning model trained without visual information
Reveals neural mechanisms in visual semantic processing
F
Feihan Feng
Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education Center for Studies of Psychological Application, South China Normal University; Guangzhou, 510631, China.
Jingxin Nie
Jingxin Nie
School of Psychology, South China Normal University
Neuroimaging