Versatile Framework with Semantic and Structural guidance for Image Reconstruction from Brain Activity

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

200K/year
🤖 AI Summary
Existing brain-to-image reconstruction methods struggle to simultaneously preserve semantic content and fine-grained structural details—such as spatial position, orientation, and scale—resulting in limited controllability and interpretability. This work proposes MindDiffuser, a two-stage framework that first leverages CLIP text embeddings decoded from neural signals to guide Stable Diffusion in generating semantically coherent images, and then iteratively refines these outputs by aligning them with shallow CLIP visual features also decoded from brain activity. By integrating semantic guidance with structural alignment for the first time, MindDiffuser achieves high-fidelity, controllable image reconstruction across multiple neuroimaging modalities—including fMRI, EEG, and MEG—significantly outperforming existing approaches in both semantic accuracy and structural consistency, while also enhancing the model’s neurobiological plausibility.
📝 Abstract
Reconstructing visual stimuli from brain recordings has been a meaningful and challenging task in brain decoding. Especially, the achievement of precise and controllable image reconstruction bears great significance in propelling the progress and utilization of brain-computer interfaces. Recent methods, leveraging advances in the power of text-to-image generation models, have reconstructed images that closely approximate complex natural stimuli in terms of semantics (e.g., concepts and objects). However, they struggle to maintain consistency with the original stimuli in fine-grained structural information (e.g., position, orientation and size), which undermines both the controllability and interpretability of the models. To address the aforementioned issues, we propose a two-stage image reconstruction framework, termed MindDiffuser. In Stage 1, Contrastive Language-Image Pretraining (CLIP) text embeddings decoded from brain responses are input into Stable Diffusion, generating a preliminary image containing semantic information. In Stage 2, we use decoded shallow CLIP visual features as supervisory signals, iteratively refining the feature vectors from Stage 1 via backpropagation to align structural information. We conducted extensive experiments on brain response datasets across three modalities (fMRI, EEG, MEG) elicited by visual stimuli, demonstrating that our framework significantly enhances the performance of previous state-of-the-art models, highlighting the effectiveness and versatility of our approach. Spatial and temporal visualization results further support the neurobiological plausibility of our framework, providing guidance for future neural decoding efforts across different brain signal modalities.
Problem

Research questions and friction points this paper is trying to address.

image reconstruction
brain decoding
structural information
semantic consistency
brain-computer interface
Innovation

Methods, ideas, or system contributions that make the work stand out.

image reconstruction from brain activity
semantic-structural guidance
two-stage framework
CLIP feature decoding
cross-modal brain decoding
Yizhuo Lu
Yizhuo Lu
中科院自动化研究所
人工智能、神经编解码
Changde Du
Changde Du
Institute of Automation, Chinese Academy of Sciences
machine learningcomputer visioncomputational neurosciencebrain-computer interface(BCI)artificial intelligence
Q
Qiongyi Zhou
State Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China.; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China.
L
Liuyun Jiang
State Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China.; School of Future Technology, University of Chinese Academy of Sciences, Beijing 100049, China.
Huiguang He
Huiguang He
Institute of Automation, Chinese Academy of Scineces
Artificial Intelligencemedical image processingBrain Computer Interface