🤖 AI Summary
This study addresses the challenge of high-fidelity facial image reconstruction from fMRI data, tackling two key limitations: difficulty in extracting high-level semantic features (e.g., identity, expression, gender) and poor cross-subject generalizability. We propose a dual-stream GAN architecture that enhances discriminator capability and mitigates generator domain bias to improve reconstruction consistency. A vision-feature-based cross-modal pretraining paradigm is introduced to enable effective transfer of conditional generative models. Additionally, a lightweight fMRI-to-neural-response alignment pretraining module is incorporated to significantly boost cross-individual generalization. Our method achieves state-of-the-art performance across multiple quantitative metrics—marking the first demonstration of highly consistent reconstruction of identity, expression, and gender attributes from fMRI, thereby advancing brain-to-image mapping to a new SOTA level.
📝 Abstract
Face plays an important role in humans visual perception, and reconstructing perceived faces from brain activities is challenging because of its difficulty in extracting high-level features and maintaining consistency of multiple face attributes, such as expression, identity, gender, etc. In this study, we proposed a novel reconstruction framework, which we called Double-Flow GAN, that can enhance the capability of discriminator and handle imbalances in images from certain domains that are too easy for generators. We also designed a pretraining process that uses features extracted from images as conditions for making it possible to pretrain the conditional reconstruction model from fMRI in a larger pure image dataset. Moreover, we developed a simple pretrained model for fMRI alignment to alleviate the problem of cross-subject reconstruction due to the variations of brain structure among different subjects. We conducted experiments by using our proposed method and traditional reconstruction models. Results showed that the proposed method is significant at accurately reconstructing multiple face attributes, outperforms the previous reconstruction models, and exhibited state-of-the-art reconstruction abilities.