๐ค AI Summary
This work addresses multi-task affective behavior analysis in unconstrained, real-world environments. We propose a progressive multi-task learning framework that jointly performs Valence-Arousal (VA) estimation, facial expression recognition, and Action Unit (AU) detection. Methodologically, we introduce a novel staged training paradigm: (1) independent pretraining of task-specific backbone networks; followed by (2) joint optimization via cross-task feature fusion, temporal modeling (LSTM/Transformer), and adaptive task weighting to identify the optimal multi-task synergy mechanism. Evaluated on the ABAW7 Challenge, our approach achieves first place globally (overall score: 1.5286), with AU F-score = 0.5580, expression F-score = 0.4286, and VA Concordance Correlation Coefficient (CCC) = 0.5420โsubstantially outperforming both single-task baselines and existing joint-training methods. These results validate the effectiveness and generalizability of our progressive multi-task design for complex, in-the-wild affective behavior analysis.
๐ Abstract
Affective Behavior Analysis aims to develop emotionally intelligent technology that can recognize and respond to human emotions. To advance this field, the 7th Affective Behavior Analysis in-the-wild (ABAW) competition holds the Multi-Task Learning Challenge based on the s-Aff-Wild2 database. The participants are required to develop a framework that achieves Valence-Arousal Estimation, Expression Recognition, and AU detection simultaneously. To achieve this goal, we propose a progressive multi-task learning framework that fully leverages the distinct focuses of each task on facial emotion features. Specifically, our method design can be summarized into three main aspects: 1) Separate Training and Joint Training: We first train each task model separately and then perform joint training based on the pre-trained models, fully utilizing the feature focus aspects of each task to improve the overall framework performance. 2) Feature Fusion and Temporal Modeling:} We investigate effective strategies for fusing features extracted from each task-specific model and incorporate temporal feature modeling during the joint training phase, which further refines the performance of each task. 3) Joint Training Strategy Optimization: To identify the optimal joint training approach, we conduct a comprehensive strategy search, experimenting with various task combinations and training methodologies to further elevate the overall performance of each task. According to the official results, our team achieves first place in the MTL challenge with a total score of 1.5286 (i.e., AU F-score 0.5580, Expression F-score 0.4286, CCC VA score 0.5420). Our code is publicly available at https://github.com/YenanLiu/ABAW7th.