Evaluating Test-Time Adaptation For Facial Expression Recognition Under Natural Cross-Dataset Distribution Shifts

📅 2026-03-20

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This study addresses the performance degradation of facial expression recognition (FER) models in real-world scenarios due to natural distribution shifts across datasets—arising from differences in acquisition protocols, annotation criteria, and population demographics. For the first time, it systematically evaluates the effectiveness of test-time adaptation (TTA) methods under natural (non-synthetic) distribution shifts through multi-dataset experiments employing strategies such as TENT, SAR, T3A, and SHOT. The findings reveal that the performance of TTA methods is critically governed by the distributional distance and noise level of the target domain: TTA improves accuracy by up to 11.34%, with different methods exhibiting distinct advantages depending on whether the target domain is clean, highly shifted, or noisy. Notably, adaptation efficacy is strongly modulated by the magnitude of distributional divergence.

Technology Category

Application Category

📝 Abstract

Deep learning models often struggle under natural distribution shifts, a common challenge in real-world deployments. Test-Time Adaptation (TTA) addresses this by adapting models during inference without labeled source data. We present the first evaluation of TTA methods for FER under natural domain shifts, performing cross-dataset experiments with widely used FER datasets. This moves beyond synthetic corruptions to examine real-world shifts caused by differing collection protocols, annotation standards, and demographics. Results show TTA can boost FER performance under natural shifts by up to 11.34\%. Entropy minimization methods such as TENT and SAR perform best when the target distribution is clean. In contrast, prototype adjustment methods like T3A excel under larger distributional distance scenarios. Finally, feature alignment methods such as SHOT deliver the largest gains when the target distribution is noisier than our source. Our cross-dataset analysis shows that TTA effectiveness is governed by the distributional distance and the severity of the natural shift across domains.

Problem

Research questions and friction points this paper is trying to address.

Test-Time Adaptation

Facial Expression Recognition

Distribution Shift

Cross-Dataset Evaluation

Domain Generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-Time Adaptation

Facial Expression Recognition

Natural Distribution Shift