🤖 AI Summary
Existing EEG-based auditory attention decoding (AAD) algorithms suffer from low short-time-window prediction accuracy and poor robustness in multi-speaker scenarios, hindering real-time deployment. To address this, we propose a general-purpose post-processing framework based on Hidden Markov Models (HMMs), explicitly modeling the temporal continuity and transition dynamics of attention states. The framework is compatible with both causal and non-causal decoding settings. It treats raw short-window decoder outputs as observations and jointly learns state transition probabilities and observation likelihoods—achieving efficient temporal smoothing without increasing the computational burden of the underlying decoder. Experiments across varying window lengths, attention-switching frequencies, and baseline decoder accuracies demonstrate consistent performance gains (average +3.2% AUC), minimal computational overhead, and millisecond-level latency. This significantly enhances the practicality and robustness of AAD systems for real-world applications.
📝 Abstract
Auditory attention decoding (AAD) algorithms exploit brain signals, such as electroencephalography (EEG), to identify which speaker a listener is focusing on in a multi-speaker environment. While state-of-the-art AAD algorithms can identify the attended speaker on short time windows, their predictions are often too inaccurate for practical use. In this work, we propose augmenting AAD with a hidden Markov model (HMM) that models the temporal structure of attention. More specifically, the HMM relies on the fact that a subject is much less likely to switch attention than to keep attending the same speaker at any moment in time. We show how a HMM can significantly improve existing AAD algorithms in both causal (real-time) and non-causal (offline) settings. We further demonstrate that HMMs outperform existing postprocessing approaches in both accuracy and responsiveness, and explore how various factors such as window length, switching frequency, and AAD accuracy influence overall performance. The proposed method is computationally efficient, intuitive to use and applicable in both real-time and offline settings.