🤖 AI Summary
To address the poor image generation quality stemming from low spatial resolution and high noise levels in EEG signals, this paper proposes a lightweight end-to-end framework that directly synthesizes high-fidelity images from single-trial raw EEG segments—tailored for low-cost, real-time brain–computer interface applications. Methodologically, we introduce the first EEG-specific lightweight ControlNet adapter, seamlessly steering a latent diffusion model (LDM) without requiring pretraining, multi-stage losses, or external text encoders. Crucially, our approach employs end-to-end EEG feature encoding to minimize preprocessing dependencies. Evaluated on mainstream EEG benchmarks, our method achieves significantly lower Fréchet Inception Distance (FID) than state-of-the-art methods, alongside superior semantic alignment and visual fidelity. The code is publicly available; inference is efficient and deployment-ready.
📝 Abstract
Generating images from brain waves is gaining increasing attention due to its potential to advance brain-computer interface (BCI) systems by understanding how brain signals encode visual cues. Most of the literature has focused on fMRI-to-Image tasks as fMRI is characterized by high spatial resolution. However, fMRI is an expensive neuroimaging modality and does not allow for real-time BCI. On the other hand, electroencephalography (EEG) is a low-cost, non-invasive, and portable neuroimaging technique, making it an attractive option for future real-time applications. Nevertheless, EEG presents inherent challenges due to its low spatial resolution and susceptibility to noise and artifacts, which makes generating images from EEG more difficult. In this paper, we address these problems with a streamlined framework based on the ControlNet adapter for conditioning a latent diffusion model (LDM) through EEG signals. We conduct experiments and ablation studies on popular benchmarks to demonstrate that the proposed method beats other state-of-the-art models. Unlike these methods, which often require extensive preprocessing, pretraining, different losses, and captioning models, our approach is efficient and straightforward, requiring only minimal preprocessing and a few components. The code is available at https://github.com/LuigiSigillo/GWIT.