🤖 AI Summary
Dialogue engagement estimation faces challenges including poor cross-domain generalization, limited adaptability to cross-cultural and multilingual settings, and difficulty in modeling interactive dynamics—hindering robust deployment of human–computer interaction systems. To address these, we propose a domain-adaptive engagement modeling framework: (1) a domain prompt mechanism that employs learnable, domain-specific vectors to guide input representation learning; and (2) a parallel cross-attention module integrating forward and backward BiLSTMs to jointly model interlocutors’ reactive behaviors and anticipatory state alignment. Evaluated on multiple cross-cultural and multilingual benchmarks, our method achieves significant gains in generalization performance—yielding an absolute improvement of 0.45 in Concordance Correlation Coefficient (CCC) on the NoXi-J test set. Furthermore, it secured first place in the MultiMediate’25 Multidomain Engagement Estimation Challenge.
📝 Abstract
Accurate engagement estimation is essential for adaptive human-computer interaction systems, yet robust deployment is hindered by poor generalizability across diverse domains and challenges in modeling complex interaction dynamics.To tackle these issues, we propose DAPA (Domain-Adaptive Parallel Attention), a novel framework for generalizable conversational engagement modeling. DAPA introduces a Domain Prompting mechanism by prepending learnable domain-specific vectors to the input, explicitly conditioning the model on the data's origin to facilitate domain-aware adaptation while preserving generalizable engagement representations. To capture interactional synchrony, the framework also incorporates a Parallel Cross-Attention module that explicitly aligns reactive (forward BiLSTM) and anticipatory (backward BiLSTM) states between participants.Extensive experiments demonstrate that DAPA establishes a new state-of-the-art performance on several cross-cultural and cross-linguistic benchmarks, notably achieving an absolute improvement of 0.45 in Concordance Correlation Coefficient (CCC) over a strong baseline on the NoXi-J test set. The superiority of our method was also confirmed by winning the first place in the Multi-Domain Engagement Estimation Challenge at MultiMediate'25.