Affective Behaviour Analysis via Progressive Learning

๐Ÿ“… 2024-07-24
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 3
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses multi-task affective behavior analysis in unconstrained, real-world environments. We propose a progressive multi-task learning framework that jointly performs Valence-Arousal (VA) estimation, facial expression recognition, and Action Unit (AU) detection. Methodologically, we introduce a novel staged training paradigm: (1) independent pretraining of task-specific backbone networks; followed by (2) joint optimization via cross-task feature fusion, temporal modeling (LSTM/Transformer), and adaptive task weighting to identify the optimal multi-task synergy mechanism. Evaluated on the ABAW7 Challenge, our approach achieves first place globally (overall score: 1.5286), with AU F-score = 0.5580, expression F-score = 0.4286, and VA Concordance Correlation Coefficient (CCC) = 0.5420โ€”substantially outperforming both single-task baselines and existing joint-training methods. These results validate the effectiveness and generalizability of our progressive multi-task design for complex, in-the-wild affective behavior analysis.

Technology Category

Application Category

๐Ÿ“ Abstract
Affective Behavior Analysis aims to develop emotionally intelligent technology that can recognize and respond to human emotions. To advance this field, the 7th Affective Behavior Analysis in-the-wild (ABAW) competition holds the Multi-Task Learning Challenge based on the s-Aff-Wild2 database. The participants are required to develop a framework that achieves Valence-Arousal Estimation, Expression Recognition, and AU detection simultaneously. To achieve this goal, we propose a progressive multi-task learning framework that fully leverages the distinct focuses of each task on facial emotion features. Specifically, our method design can be summarized into three main aspects: 1) Separate Training and Joint Training: We first train each task model separately and then perform joint training based on the pre-trained models, fully utilizing the feature focus aspects of each task to improve the overall framework performance. 2) Feature Fusion and Temporal Modeling:} We investigate effective strategies for fusing features extracted from each task-specific model and incorporate temporal feature modeling during the joint training phase, which further refines the performance of each task. 3) Joint Training Strategy Optimization: To identify the optimal joint training approach, we conduct a comprehensive strategy search, experimenting with various task combinations and training methodologies to further elevate the overall performance of each task. According to the official results, our team achieves first place in the MTL challenge with a total score of 1.5286 (i.e., AU F-score 0.5580, Expression F-score 0.4286, CCC VA score 0.5420). Our code is publicly available at https://github.com/YenanLiu/ABAW7th.
Problem

Research questions and friction points this paper is trying to address.

Develop emotionally intelligent technology for emotion recognition.
Simultaneously achieve Valence-Arousal Estimation, Expression Recognition, and AU detection.
Propose a progressive multi-task learning framework for improved performance.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive multi-task learning framework
Feature fusion and temporal modeling
Joint training strategy optimization
๐Ÿ”Ž Similar Papers
No similar papers found.
C
Chen Liu
The University of Queensland, Queensland, Australia
W
Wei Zhang
NetEase Fuxi AI Lab, Hangzhou, China
Feng Qiu
Feng Qiu
Argonne National Laboratory
Mathematical programmingoptimizationpower systemsenergy systems
Lincheng Li
Lincheng Li
Netease Fuxi AI Lab
computer vision3D visionvideo synthesismulti-view stereo
X
Xin Yu
The University of Queensland, Queensland, Australia