CA-TCN: A Causal-Anticausal Temporal Convolutional Network for Direct Auditory Attention Decoding

📅 2026-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical challenge of accurately decoding the speech stream attended by a listener in multi-speaker scenarios using electroencephalography (EEG). The authors propose the CA-TCN model, which introduces—for the first time—a dual-branch temporal convolutional network comprising causal and anticausal pathways to explicitly model the bidirectional temporal alignment between EEG neural responses and speech stimuli. This architecture maintains online processing capability while enhancing decoding accuracy. By integrating EEG spatial filtering with an end-to-end classification framework, the model consistently outperforms existing methods across multiple datasets, achieving accuracy gains of 0.5%–3.2% in subject-independent settings and 0.8%–2.9% in subject-dependent settings, along with excellent spatial robustness.
📝 Abstract
A promising approach for steering auditory attention in complex listening environments relies on Auditory Attention Decoding (AAD), which aim to identify the attended speech stream in a multiple speaker scenario from neural recordings. Entrainment-based AAD approaches, typically assume access to clean speech sources and electroencephalography (EEG) signals to exploit low-frequency correlations between the neural response and the attended stimulus. In this study, we propose CA-TCN, a Causal-Anticausal Temporal Convolutional Network that directly classifies the attended speaker. The proposed architecture integrates several best practices from convolutional neural networks in sequence processing tasks. Importantly, it explicitly aligns auditory stimuli and neural responses by employing separate causal and anticausal convolutions respectively, with distinct receptive fields operating in opposite temporal directions. Experimental results, obtained through comparisons with three baseline AAD models, demonstrated that CA-TCN consistently improved decoding accuracy across datasets and decision windows, with gains ranging from 0.5% to 3.2% for subject-independent models and from 0.8% to 2.9% for subject-specific models compared with the next best-performing model, AADNet. Moreover, these improvements were statistically significant in four of the six evaluated settings when comparing Minimum Expected Switch Duration distributions. Beyond accuracy, the model demonstrated spatial robustness across different conditions, as the EEG spatial filters exhibited stable patterns across datasets. Overall, this work introduces an accurate and unified AAD model that outperforms existing methods while considering practical benefits for online processing scenarios. These findings contribute to advancing the state of AAD and its applicability in real-world systems.
Problem

Research questions and friction points this paper is trying to address.

Auditory Attention Decoding
EEG
speech stream selection
neural decoding
multi-speaker environment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal-Anticausal Convolution
Temporal Convolutional Network
Auditory Attention Decoding
EEG-based Classification
Online Neural Decoding
🔎 Similar Papers
No similar papers found.
I
Iñigo García-Ugarte
Department of Sciences, Universidad Pública de Navarra (UPNA), Pamplona, 31006, Navarre, Spain; BCBL, Basque Center on Cognition Brain and Language, San Sebastián, 20009, Spain
R
Rubén Eguinoa
Department of Sciences, Universidad Pública de Navarra (UPNA), Pamplona, 31006, Navarre, Spain
R
Ricardo San Martín
Department of Sciences, Universidad Pública de Navarra (UPNA), Pamplona, 31006, Navarre, Spain
Daniel Paternain
Daniel Paternain
Department of Statistics, Computer Science and Mathematics
Artificial IntelligenceMachine LearningComputer Vision
Carmen Vidaurre
Carmen Vidaurre
Ikerbasque Research Associate Prof., BCBL & BIFOLD TU-Berlin