SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model

📅 2026-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limited clinical applicability of existing automated sleep staging methods due to their lack of auditable reasoning processes. To overcome this, the authors propose the first rule-guided vision–language model that leverages polysomnographic waveform images to perform sleep staging strictly in accordance with the American Academy of Sleep Medicine (AASM) scoring guidelines, while generating human-interpretable justifications for its predictions. Through waveform-aware pretraining followed by rule-supervised fine-tuning, the model achieves Cohen’s kappa scores of 0.767 and 0.743 on the MASS-SS1 and ZUAMHCS datasets, respectively. Expert evaluations consistently rate the model’s reasoning quality above 4.0 out of 5.0 across all assessed dimensions. The work also introduces a new dataset, MASS-EX, to advance research in interpretable sleep analysis.
📝 Abstract
While automated sleep staging has achieved expert-level accuracy, its clinical adoption is hindered by a lack of auditable reasoning. We introduce SleepVLM, a rule-grounded vision-language model (VLM) designed to stage sleep from multi-channel polysomnography (PSG) waveform images while generating clinician-readable rationales based on American Academy of Sleep Medicine (AASM) scoring criteria. Utilizing waveform-perceptual pre-training and rule-grounded supervised fine-tuning, SleepVLM achieved Cohen's kappa scores of 0.767 on an held out test set (MASS-SS1) and 0.743 on an external cohort (ZUAMHCS), matching state-of-the-art performance. Expert evaluations further validated the quality of the model's reasoning, with mean scores exceeding 4.0/5.0 for factual accuracy, evidence comprehensiveness, and logical coherence. By coupling competitive performance with transparent, rule-based explanations, SleepVLM may improve the trustworthiness and auditability of automated sleep staging in clinical workflows. To facilitate further research in interpretable sleep medicine, we release MASS-EX, a novel expert-annotated dataset.
Problem

Research questions and friction points this paper is trying to address.

sleep staging
explainability
clinical adoption
auditable reasoning
interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language model
rule-grounded reasoning
explainable AI
sleep staging
polysomnography
🔎 Similar Papers
No similar papers found.
G
Guifeng Deng
Affiliated Mental Health Center & Hangzhou Seventh People’s Hospital, School of Brain Science and Brain Medicine, and Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China; College of Biomedical Engineering & Instrument Science, Zhejiang University, Hangzhou 310058, China
P
Pan Wang
Department of Psychiatry and Mental Health, Wenzhou Medical University, Wenzhou 325035, Zhejiang Province, China; Affiliated Mental Health Center & Hangzhou Seventh People’s Hospital, School of Brain Science and Brain Medicine, and Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
J
Jiquan Wang
MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, Hangzhou, 311121, China; Affiliated Mental Health Center & Hangzhou Seventh People’s Hospital, School of Brain Science and Brain Medicine, and Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China; Zhejiang Key Laboratory of Clinical and Basic Research for Psychiatric Diseases, Hangzhou 310058, China
S
Shuying Rao
Affiliated Mental Health Center & Hangzhou Seventh People’s Hospital, School of Brain Science and Brain Medicine, and Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China; College of Biomedical Engineering & Instrument Science, Zhejiang University, Hangzhou 310058, China
J
Junyi Xie
Affiliated Mental Health Center & Hangzhou Seventh People’s Hospital, School of Brain Science and Brain Medicine, and Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
W
Wanjun Guo
Affiliated Mental Health Center & Hangzhou Seventh People’s Hospital, School of Brain Science and Brain Medicine, and Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China; MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, Hangzhou, 311121, China; Zhejiang Key Laboratory of Clinical and Basic Research for Psychiatric Diseases, Hangzhou 310058, China
T
Tao Li
Affiliated Mental Health Center & Hangzhou Seventh People’s Hospital, School of Brain Science and Brain Medicine, and Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China; MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, Hangzhou, 311121, China; Zhejiang Key Laboratory of Clinical and Basic Research for Psychiatric Diseases, Hangzhou 310058, China
Haiteng Jiang
Haiteng Jiang
MOE Frontier Science Center for Brain Science and Brain-Machine Integration, Zhejiang University
NeuroengineeringMachine Learning,Cognitive Neuroscience