🤖 AI Summary
This work addresses the overreliance of existing automatic sleep staging methods on complex models to capture long-range dependencies, while overlooking the strong inherent local temporal continuity in sleep sequences. We demonstrate for the first time that a randomly initialized, untrained Transformer can serve as an effective adaptive sequence smoother, with its performance gains stemming from architectural inductive bias rather than learned parameters. To formalize this insight, we introduce the Random Attention Prior Kernel (RAPK) theory and propose two quantitative metrics—the Local Smoothing Influence Index (LSII) and Weighted Transfer Entropy (WTE)—to assess smoothing efficacy and stage-transition preservation. Experiments across multiple datasets show that the untrained random Transformer significantly outperforms conventional heuristic smoothing approaches, offering a novel paradigm for lightweight deployment on edge devices.
📝 Abstract
Automatic sleep staging commonly adopts Transformers under the assumption that they learn complex long-range dependencies. We challenge this view by revealing a neglected property of sleep sequences: strong local temporal continuity. We show that a randomly initialized Transformer, without any training, substantially improves sleep staging performance and consistently outperforms heuristic smoothing. We formalize this effect via a Random Attention Prior Kernel (RAPK), showing that random self-attention acts as an adaptive smoother by balancing global averaging and content-based similarity while preserving stage transitions. Using two metrics, the Local Smoothness Influence Index (LSII) and the Weighted Transition Entropy (WTE), we provide evidence that most performance gains in Transformer-based sleep staging arise from architectural inductive bias rather than parameter learning. Our results suggest that sleep staging can be effectively addressed with structure-driven smoothing mechanisms rather than complex dependency modeling, enabling more efficient and edge-deployable healthcare systems for large-scale physiological monitoring.