🤖 AI Summary
This study systematically investigates the alignment mechanisms between deep neural networks and human brain function, focusing on neural encoding (input → brain response) and decoding (brain signals → semantic/perceptual reconstruction) across language, vision, and audition. Methodologically, it introduces the first cross-modal unified framework integrating fMRI, MEG, and EEG data with Transformers, CNNs, and diffusion models, augmented by canonical correlation analysis (CCA), representational similarity analysis (RSA), and adversarial training; it further proposes a multidimensional evaluation metric for brain-alignment quality and incorporates comparative animal-model evidence and neuroethics considerations. Key contributions include: (1) uncovering strong correspondence between high-level semantic model representations and hierarchical gradients in temporal and prefrontal cortices; (2) achieving high-fidelity image and speech reconstruction, with ImageNet-class decoding accuracy exceeding 85%; and (3) advancing novel paradigms for clinical brain–computer interfaces and early diagnosis of cognitive disorders.
📝 Abstract
Can artificial intelligence unlock the secrets of the human brain? How do the inner mechanisms of deep learning models relate to our neural circuits? Is it possible to enhance AI by tapping into the power of brain recordings? These captivating questions lie at the heart of an emerging field at the intersection of neuroscience and artificial intelligence. Our survey dives into this exciting domain, focusing on human brain recording studies and cutting-edge cognitive neuroscience datasets that capture brain activity during natural language processing, visual perception, and auditory experiences. We explore two fundamental approaches: encoding models, which attempt to generate brain activity patterns from sensory inputs; and decoding models, which aim to reconstruct our thoughts and perceptions from neural signals. These techniques not only promise breakthroughs in neurological diagnostics and brain-computer interfaces but also offer a window into the very nature of cognition. In this survey, we first discuss popular representations of language, vision, and speech stimuli, and present a summary of neuroscience datasets. We then review how the recent advances in deep learning transformed this field, by investigating the popular deep learning based encoding and decoding architectures, noting their benefits and limitations across different sensory modalities. From text to images, speech to videos, we investigate how these models capture the brain's response to our complex, multimodal world. While our primary focus is on human studies, we also highlight the crucial role of animal models in advancing our understanding of neural mechanisms. Throughout, we mention the ethical implications of these powerful technologies, addressing concerns about privacy and cognitive liberty. We conclude with a summary and discussion of future trends in this rapidly evolving field.