🤖 AI Summary
Existing EEG-based visual decoding methods neglect the brain’s intrinsic hierarchical organization. Method: This paper proposes a tri-stream hierarchical decoding framework grounded in the Hubel-Wiesel theory, which explicitly decouples visual stimuli into contours, foreground objects, and scene context, and models each via dedicated EEG encoders. A cross-attention routing mechanism enables progressive cross-modal feature fusion, while CLIP-aligned hierarchical contrastive learning supports biologically interpretable zero-shot recognition. Contribution/Results: To our knowledge, this is the first work to explicitly embed canonical visual cortical pathway modeling into the EEG decoding pipeline. On the THINGS-EEG dataset, the method achieves 40.9% within-subject and 22.9% cross-subject Top-1 accuracy—surpassing state-of-the-art by over 45%.
📝 Abstract
Understanding and decoding brain activity into visual representations is a fundamental challenge at the intersection of neuroscience and artificial intelligence. While EEG-based visual decoding has shown promise due to its non-invasive, low-cost nature and millisecond-level temporal resolution, existing methods are limited by their reliance on flat neural representations that overlook the brain's inherent visual hierarchy. In this paper, we introduce ViEEG, a biologically inspired hierarchical EEG decoding framework that aligns with the Hubel-Wiesel theory of visual processing. ViEEG decomposes each visual stimulus into three biologically aligned components-contour, foreground object, and contextual scene-serving as anchors for a three-stream EEG encoder. These EEG features are progressively integrated via cross-attention routing, simulating cortical information flow from V1 to IT to the association cortex. We further adopt hierarchical contrastive learning to align EEG representations with CLIP embeddings, enabling zero-shot object recognition. Extensive experiments on the THINGS-EEG dataset demonstrate that ViEEG achieves state-of-the-art performance, with 40.9% Top-1 accuracy in subject-dependent and 22.9% Top-1 accuracy in cross-subject settings, surpassing existing methods by over 45%. Our framework not only advances the performance frontier but also sets a new paradigm for biologically grounded brain decoding in AI.