🤖 AI Summary
Existing EEG-based approaches for predicting speech intelligibility struggle to match the accuracy and robustness of behavioral testing, limiting their applicability in populations unable to comply with behavioral tasks. This work proposes a multi-decoder fusion strategy that integrates hundreds of decoders trained on diverse speech features and EEG preprocessing configurations to construct high-dimensional neural tracking representations. These representations are then combined with support vector regression to predict speech reception thresholds (SRTs). Using only 15 minutes of EEG data, the method achieves objective SRT estimates highly consistent with behavioral measurements (r = 0.647, p < 0.001; NRMSE = 0.19), with all prediction errors within 1 dB, thereby substantially improving both prediction accuracy and practical utility.
📝 Abstract
Objective: EEG-based methods can predict speech intelligibility, but their accuracy and robustness lag behind behavioral tests, which typically show test-retest differences under 1 dB. We introduce the multi-decoder method to predict speech reception thresholds (SRTs) from EEG recordings, enabling objective assessment for populations unable to perform behavioral tests; such as those with disorders of consciousness or during hearing aid fitting. Approach: The method aggregates data from hundreds of decoders, each trained on different speech features and EEG preprocessing setups to quantify neural tracking (NT) of speech signals. Using data from 39 participants (ages 18-24), we recorded 29 minutes of EEG per person while they listened to speech at six signal-to-noise ratios and a quiet story. NT values were combined into a high-dimensional feature vector per subject, and a support vector regression model was trained to predict SRTs from these vectors. Main Result: Predictions correlated significantly with behavioral SRTs (r = 0.647, p<0.001; NRMSE = 0.19), with all differences under 1 dB. SHAP analysis showed theta/delta bands and early lags had slightly greater influence. Using pretrained subject-independent decoders reduced required EEG data collection to 15 minutes (3 minutes of story, 12 minutes across six SNR conditions) without losing accuracy.