Affective Behaviour Analysis via Progressive Learning

📅 2024-07-24

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses multi-task affective behavior analysis in unconstrained, real-world environments. We propose a progressive multi-task learning framework that jointly performs Valence-Arousal (VA) estimation, facial expression recognition, and Action Unit (AU) detection. Methodologically, we introduce a novel staged training paradigm: (1) independent pretraining of task-specific backbone networks; followed by (2) joint optimization via cross-task feature fusion, temporal modeling (LSTM/Transformer), and adaptive task weighting to identify the optimal multi-task synergy mechanism. Evaluated on the ABAW7 Challenge, our approach achieves first place globally (overall score: 1.5286), with AU F-score = 0.5580, expression F-score = 0.4286, and VA Concordance Correlation Coefficient (CCC) = 0.5420—substantially outperforming both single-task baselines and existing joint-training methods. These results validate the effectiveness and generalizability of our progressive multi-task design for complex, in-the-wild affective behavior analysis.

Technology Category

Application Category

📝 Abstract

Affective Behavior Analysis aims to develop emotionally intelligent technology that can recognize and respond to human emotions. To advance this field, the 7th Affective Behavior Analysis in-the-wild (ABAW) competition holds the Multi-Task Learning Challenge based on the s-Aff-Wild2 database. The participants are required to develop a framework that achieves Valence-Arousal Estimation, Expression Recognition, and AU detection simultaneously. To achieve this goal, we propose a progressive multi-task learning framework that fully leverages the distinct focuses of each task on facial emotion features. Specifically, our method design can be summarized into three main aspects: 1) Separate Training and Joint Training: We first train each task model separately and then perform joint training based on the pre-trained models, fully utilizing the feature focus aspects of each task to improve the overall framework performance. 2) Feature Fusion and Temporal Modeling:} We investigate effective strategies for fusing features extracted from each task-specific model and incorporate temporal feature modeling during the joint training phase, which further refines the performance of each task. 3) Joint Training Strategy Optimization: To identify the optimal joint training approach, we conduct a comprehensive strategy search, experimenting with various task combinations and training methodologies to further elevate the overall performance of each task. According to the official results, our team achieves first place in the MTL challenge with a total score of 1.5286 (i.e., AU F-score 0.5580, Expression F-score 0.4286, CCC VA score 0.5420). Our code is publicly available at https://github.com/YenanLiu/ABAW7th.

Problem

Research questions and friction points this paper is trying to address.

Develop emotionally intelligent technology for emotion recognition.

Simultaneously achieve Valence-Arousal Estimation, Expression Recognition, and AU detection.

Propose a progressive multi-task learning framework for improved performance.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive multi-task learning framework

Feature fusion and temporal modeling

Joint training strategy optimization

🔎 Similar Papers

Empathy Detection from Text, Audiovisual, Audio or Physiological Signals: A Systematic Review of Task Formulations and Machine Learning Methods