EEG-VLM: A Hierarchical Vision-Language Model with Multi-Level Feature Alignment and Visually Enhanced Language-Guided Reasoning for EEG Image-Based Sleep Stage Prediction

๐Ÿ“… 2025-11-24
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenge of jointly achieving fine-grained time-frequency modeling and clinical interpretability in EEG-based sleep staging, this paper proposes a hierarchical visionโ€“language model. Methodologically, it introduces a vision-enhancement module to generate high-level semantic tokens, integrates multi-level feature alignment to fuse time-frequency representations with CLIP-pretrained linguistic priors, and incorporates a chain-of-thought (CoT) reasoning module to emulate expert decision-making logic. Unlike existing approaches, our model eliminates handcrafted features while preserving end-to-end learnability, significantly improving both discriminative accuracy and clinical interpretability. Evaluated on public EEG sleep staging datasets, it achieves a 3.2% absolute gain in classification accuracy; moreover, its generated CoT reasoning paths exhibit strong consistency with clinical annotations. This work establishes a novel paradigm for automated, trustworthy EEG analysis.

Technology Category

Application Category

๐Ÿ“ Abstract
Sleep stage classification based on electroencephalography (EEG) is fundamental for assessing sleep quality and diagnosing sleep-related disorders. However, most traditional machine learning methods rely heavily on prior knowledge and handcrafted features, while existing deep learning models still struggle to jointly capture fine-grained time-frequency patterns and achieve clinical interpretability. Recently, vision-language models (VLMs) have made significant progress in the medical domain, yet their performance remains constrained when applied to physiological waveform data, especially EEG signals, due to their limited visual understanding and insufficient reasoning capability. To address these challenges, we propose EEG-VLM, a hierarchical vision-language framework that integrates multi-level feature alignment with visually enhanced language-guided reasoning for interpretable EEG-based sleep stage classification. Specifically, a specialized visual enhancement module constructs high-level visual tokens from intermediate-layer features to extract rich semantic representations of EEG images. These tokens are further aligned with low-level CLIP features through a multi-level alignment mechanism, enhancing the VLM's image-processing capability. In addition, a Chain-of-Thought (CoT) reasoning strategy decomposes complex medical inference into interpretable logical steps, effectively simulating expert-like decision-making. Experimental results demonstrate that the proposed method significantly improves both the accuracy and interpretability of VLMs in EEG-based sleep stage classification, showing promising potential for automated and explainable EEG analysis in clinical settings.
Problem

Research questions and friction points this paper is trying to address.

Improves EEG sleep stage classification accuracy and interpretability using vision-language models
Addresses limited visual understanding in physiological waveform analysis through feature alignment
Enhances clinical reasoning capability via visually guided Chain-of-Thought decomposition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical vision-language framework for EEG classification
Multi-level feature alignment with visual enhancement module
Chain-of-Thought reasoning for interpretable medical inference
๐Ÿ”Ž Similar Papers
No similar papers found.
Xihe Qiu
Xihe Qiu
Associate Professor, Shanghai University of Engineering Science
AI for HealthcareVision-Language ModelsReinforcement LearningLarge Language Models
G
Gengchen Ma
School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, 201620, China
H
Haoyu Wang
Department of Control Science and Engineering, College of Electronics and Information Engineering, Tongji University, Shanghai 200092, China
Chen Zhan
Chen Zhan
Bioinformatician / Research Fellow, University of Adelaide
BioinformaticsData MiningPharmacoepidemiologyArtificial Intelligence
X
Xiaoyu Tan
Tencent Youtu Lab, Shanghai 200232, China
S
Shuo Li
Case Western Reserve University, USA