Interpretable Concept-based Deep Learning Framework for Multimodal Human Behavior Modeling

📅 2025-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the longstanding challenge in affective computing of reconciling model interpretability with predictive performance. We propose an interpretable, concept-driven deep learning framework for multimodal human behavior modeling. Its core innovation is the Attention-Guided Concept Model (AGCM), the first of its kind to jointly enable differentiable concept discovery, concept embedding, and cross-modal attention alignment—yielding learnable, spatially localizable, and semantically grounded concept-level explanations that comply with GDPR requirements for high-risk AI systems. The framework natively supports spatiotemporal multimodal inputs and is validated on facial expression recognition benchmarks, then extended to real-world human behavior understanding tasks. It achieves state-of-the-art accuracy while delivering strong domain-specific interpretability, bridging the gap between transparency and performance in affective AI.

Technology Category

Application Category

📝 Abstract
In the contemporary era of intelligent connectivity, Affective Computing (AC), which enables systems to recognize, interpret, and respond to human behavior states, has become an integrated part of many AI systems. As one of the most critical components of responsible AI and trustworthiness in all human-centered systems, explainability has been a major concern in AC. Particularly, the recently released EU General Data Protection Regulation requires any high-risk AI systems to be sufficiently interpretable, including biometric-based systems and emotion recognition systems widely used in the affective computing field. Existing explainable methods often compromise between interpretability and performance. Most of them focus only on highlighting key network parameters without offering meaningful, domain-specific explanations to the stakeholders. Additionally, they also face challenges in effectively co-learning and explaining insights from multimodal data sources. To address these limitations, we propose a novel and generalizable framework, namely the Attention-Guided Concept Model (AGCM), which provides learnable conceptual explanations by identifying what concepts that lead to the predictions and where they are observed. AGCM is extendable to any spatial and temporal signals through multimodal concept alignment and co-learning, empowering stakeholders with deeper insights into the model's decision-making process. We validate the efficiency of AGCM on well-established Facial Expression Recognition benchmark datasets while also demonstrating its generalizability on more complex real-world human behavior understanding applications.
Problem

Research questions and friction points this paper is trying to address.

Interpretable multimodal behavior modeling
Balancing interpretability and performance
Domain-specific explanations for stakeholders
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention-Guided Concept Model
Multimodal concept alignment
Learnable conceptual explanations
🔎 Similar Papers
No similar papers found.