🤖 AI Summary
This work addresses the challenge of reconstructing visual images from low-density, high-noise non-invasive electroencephalography (EEG), which suffers from poor spatial resolution and degraded performance under realistic few-channel configurations. The authors propose an end-to-end EEG-to-image framework that integrates an EEG-conditioned diffusion model with a semantics-guided post-processing enhancement mechanism. Specifically, semantic prompts are extracted from EEG signals using a multimodal large language model and subsequently leveraged by an image-to-image diffusion model to refine reconstruction quality. Systematic evaluation across channel settings from 128 down to 24 demonstrates that the method substantially improves geometric structure and perceptual consistency under low-channel conditions, achieving up to a 9.71% gain in Inception Score with only a marginal increase in Fréchet Inception Distance. User studies further confirm its perceptual advantages, advancing EEG-based image decoding toward practical applications.
📝 Abstract
Reconstructing visual stimuli from non-invasive electroencephalography (EEG) remains challenging due to its low spatial resolution and high noise, particularly under realistic low-density electrode configurations. To address this, we present EEG2Vision, a modular, end-to-end EEG-to-image framework that systematically evaluates reconstruction performance across different EEG resolutions (128, 64, 32, and 24 channels) and enhances visual quality through a prompt-guided post-reconstruction boosting mechanism. Starting from EEG-conditioned diffusion reconstruction, the boosting stage uses a multimodal large language model to extract semantic descriptions and leverages image-to-image diffusion to refine geometry and perceptual coherence while preserving EEG-grounded structure. Our experiments show that semantic decoding accuracy degrades significantly with channel reduction (e.g., 50-way Top-1 Acc from 89% to 38%), while reconstruction quality slight decreases (e.g., FID from 76.77 to 80.51). The proposed boosting consistently improves perceptual metrics across all configurations, achieving up to 9.71% IS gains in low-channel settings. A user study confirms the clear perceptual preference for boosted reconstructions. The proposed approach significantly boosts the feasibility of real-time brain-2-image applications using low-resolution EEG devices, potentially unlocking this type of applications outside laboratory settings.