Brain-CLIPLM: Decoding Compressed Semantic Representations in EEG for Language Reconstruction

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This work addresses the challenge of reconstructing sentence-level natural language from non-invasive electroencephalography (EEG) signals, which suffer from low signal-to-noise ratio and limited bandwidth. The authors propose a “semantic compression” hypothesis, positing that EEG primarily encodes recoverable semantic anchors rather than full syntactic structures. Based on this insight, they introduce a two-stage decoding framework: first, ordered semantic anchors are recovered from EEG via contrastive learning; then, a retrieval-augmented large language model guided by chain-of-thought prompting reconstructs sentence semantics from these anchors. This approach uniquely decouples EEG-to-text translation into semantic anchor recovery and anchor-guided generation, aligning decoding granularity with the scale of neural information recoverable from EEG. Evaluated on the ZuCo benchmark, the method achieves 67.6% Top-5 and 85.0% Top-25 sentence retrieval accuracy, demonstrating that EEG-derived anchors convey sentence-specific semantic information beyond the prior knowledge of language models.

📝 Abstract

Decoding natural language from non-invasive electroencephalography (EEG) remains fundamentally limited by low signal-to-noise ratio and restricted information bandwidth. This raises a fundamental question regarding whether sentence-level linguistic structure can be reliably recovered from such signals. In this work, we suggest that this assumption may not hold under realistic information constraints, and instead propose a semantic compression hypothesis in which EEG signals encode a compressed set of semantic anchors rather than full linguistic structure. Under our new perspective, direct sentence reconstruction becomes an overparameterized objective relative to the intrinsic information capacity of EEG. To address this mismatch, we introduce Brain-CLIPLM, a two-stage framework that decomposes EEG-to-text decoding into semantic anchor extraction via contrastive learning and sentence reconstruction using a retrieval-grounded large language model (LLM) with Chain-of-Thought (CoT) reasoning, following a granularity matching principle that aligns decoding complexity with neural information capacity. Evaluated on the Zurich Cognitive Language Processing Corpus, Brain-CLIPLM achieves 67.55\% top-5 and 85.00\% top-25 sentence retrieval accuracy, significantly outperforming direct decoding baseline, while cross-subject evaluation confirms robust generalization. Control analyses, including permutation testing, further demonstrate that EEG-derived representations carry sentence-specific information beyond language model priors. These results suggest that EEG-to-text decoding is better framed as recovering compressed semantic content rather than reconstructing full sentences, providing a biologically grounded and data-efficient pathway for non-invasive brain-computer interfaces.

Problem

Research questions and friction points this paper is trying to address.

EEG-to-text decoding

semantic compression

non-invasive EEG

sentence-level language

information bandwidth

Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic compression

EEG-to-text decoding

contrastive learning