🤖 AI Summary
Existing benchmarks lack joint evaluation of fairness and robustness under domain shift, hindering systematic assessment of fair domain generalization. Method: We introduce FairDG-Face, the first large-scale facial image benchmark designed for this purpose, comprising 100K images across four visually distinct domains and 39 fine-grained annotations spanning 14 attributes. The dataset is constructed via multi-source collection, distribution balancing, and cross-domain attribute alignment to support both fairness-aware learning and domain generalization evaluation. Contribution/Results: Extensive experiments reveal significant fairness degradation in state-of-the-art models under domain shift—e.g., up to a 12.7% accuracy gap across skin-tone groups—demonstrating FairDG-Face’s effectiveness in exposing methodological limitations and advancing fair domain adaptation techniques.
📝 Abstract
Ensuring fairness and robustness in machine learning models remains a challenge, particularly under domain shifts. We present Face4FairShifts, a large-scale facial image benchmark designed to systematically evaluate fairness-aware learning and domain generalization. The dataset includes 100,000 images across four visually distinct domains with 39 annotations within 14 attributes covering demographic and facial features. Through extensive experiments, we analyze model performance under distribution shifts and identify significant gaps. Our findings emphasize the limitations of existing related datasets and the need for more effective fairness-aware domain adaptation techniques. Face4FairShifts provides a comprehensive testbed for advancing equitable and reliable AI systems. The dataset is available online at https://meviuslab.github.io/Face4FairShifts/.