MIDAS: Mixing Ambiguous Data with Soft Labels for Dynamic Facial Expression Recognition

📅 2024-01-03

🏛️ IEEE Workshop/Winter Conference on Applications of Computer Vision

📈 Citations: 3

✨ Influential: 1

career value

210K/year

🤖 AI Summary

In dynamic facial expression recognition (DFER) under unconstrained real-world conditions, motion blur and semantic ambiguity of expressions severely hinder accurate classification. To address this, we propose the first video-level soft-label mixup augmentation method, which jointly performs convex interpolation across video frames and their corresponding multi-emotion probability soft labels—explicitly modeling both expression continuity and semantic uncertainty. Our approach comprises three components: (1) soft label construction via emotion distribution estimation, (2) soft-label-guided frame-level mixup augmentation, and (3) an end-to-end trainable framework. Evaluated on the DFEW benchmark, our method achieves significant improvements over existing state-of-the-art methods, demonstrating that soft-label mixing enhances model robustness to ambiguous, dynamically evolving expressions in the wild. This work establishes a novel paradigm for uncertainty-aware learning in DFER, advancing the integration of probabilistic semantics into video-based representation learning.

Technology Category

Application Category

📝 Abstract

Dynamic facial expression recognition (DFER) is an important task in the field of computer vision. To apply automatic DFER in practice, it is necessary to accurately recognize ambiguous facial expressions, which often appear in data in the wild. In this paper, we propose MIDAS, a data augmentation method for DFER, which augments ambiguous facial expression data with soft labels consisting of probabilities for multiple emotion classes. In MIDAS, the training data are augmented by convexly combining pairs of video frames and their corresponding emotion class labels, which can also be regarded as an extension of mixup to soft-labeled video data. This simple extension is remarkably effective in DFER with ambiguous facial expression data. To evaluate MIDAS, we conducted experiments on the DFEW dataset. The results demonstrate that the model trained on the data augmented by MIDAS outperforms the existing state-of-the-art method trained on the original dataset.

Problem

Research questions and friction points this paper is trying to address.

Recognizing ambiguous facial expressions in dynamic scenarios.

Augmenting data with soft labels for multiple emotion classes.

Improving dynamic facial expression recognition accuracy.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Augments ambiguous facial expression data

Uses soft labels for multiple emotion classes

Extends mixup to soft-labeled video data

🔎 Similar Papers

Rethinking the Learning Paradigm for Facial Expression Recognition