🤖 AI Summary
This study addresses the limited clinical applicability of existing automated sleep staging methods due to their lack of auditable reasoning processes. To overcome this, the authors propose the first rule-guided vision–language model that leverages polysomnographic waveform images to perform sleep staging strictly in accordance with the American Academy of Sleep Medicine (AASM) scoring guidelines, while generating human-interpretable justifications for its predictions. Through waveform-aware pretraining followed by rule-supervised fine-tuning, the model achieves Cohen’s kappa scores of 0.767 and 0.743 on the MASS-SS1 and ZUAMHCS datasets, respectively. Expert evaluations consistently rate the model’s reasoning quality above 4.0 out of 5.0 across all assessed dimensions. The work also introduces a new dataset, MASS-EX, to advance research in interpretable sleep analysis.
📝 Abstract
While automated sleep staging has achieved expert-level accuracy, its clinical adoption is hindered by a lack of auditable reasoning. We introduce SleepVLM, a rule-grounded vision-language model (VLM) designed to stage sleep from multi-channel polysomnography (PSG) waveform images while generating clinician-readable rationales based on American Academy of Sleep Medicine (AASM) scoring criteria. Utilizing waveform-perceptual pre-training and rule-grounded supervised fine-tuning, SleepVLM achieved Cohen's kappa scores of 0.767 on an held out test set (MASS-SS1) and 0.743 on an external cohort (ZUAMHCS), matching state-of-the-art performance. Expert evaluations further validated the quality of the model's reasoning, with mean scores exceeding 4.0/5.0 for factual accuracy, evidence comprehensiveness, and logical coherence. By coupling competitive performance with transparent, rule-based explanations, SleepVLM may improve the trustworthiness and auditability of automated sleep staging in clinical workflows. To facilitate further research in interpretable sleep medicine, we release MASS-EX, a novel expert-annotated dataset.