🤖 AI Summary
Reconstructing visual stimuli from noisy, spatially diffuse, and temporally variable EEG signals remains challenging. To address this, we propose a novel cross-modal EEG-to-image decoding framework that embeds spatiotemporal Transformer-extracted EEG representations into the attention mechanism of a latent diffusion model (LDM), enabling attention-guided and interpretable feature fusion. This work establishes the first synergistic modeling of Transformer-derived neural representations and LDM attention, eliminating reliance on fixed stimulus sets—a key limitation of prior methods—and substantially improving semantic interpretability and cross-category generalization. On public benchmarks, our approach achieves a 6.5% improvement in latent-space clustering accuracy, an 11.8% gain in zero-shot reconstruction performance, and attains state-of-the-art Inception Score and Fréchet Inception Distance (FID).
📝 Abstract
Advances in neuroscience and artificial intelligence have enabled preliminary decoding of brain activity. However, despite the progress, the interpretability of neural representations remains limited. A significant challenge arises from the intrinsic properties of electroencephalography (EEG) signals, including high noise levels, spatial diffusion, and pronounced temporal variability. To interpret the neural mechanism underlying thoughts, we propose a transformers-based framework to extract spatial-temporal representations associated with observed visual stimuli from EEG recordings. These features are subsequently incorporated into the attention mechanisms of Latent Diffusion Models (LDMs) to facilitate the reconstruction of visual stimuli from brain activity. The quantitative evaluations on publicly available benchmark datasets demonstrate that the proposed method excels at modeling the semantic structures from EEG signals; achieving up to 6.5% increase in latent space clustering accuracy and 11.8% increase in zero shot generalization across unseen classes while having comparable Inception Score and Fréchet Inception Distance with existing baselines. Our work marks a significant step towards generalizable semantic interpretation of the EEG signals.