Beyond black and white: A more nuanced approach to facial recognition with continuous ethnicity labels

πŸ“… 2025-06-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Facial recognition models suffer from data bias due to the discretization of ethnic labels, which obscures the inherent continuity of ethnic distributions and leads to the fundamental flaw that β€œequal identity counts β‰  balanced data.” This work pioneers a continuous ethnicity modeling paradigm, replacing discrete labels with continuous ethnic embeddings. We propose a novel training framework integrating spectral-distance-driven dynamic reweighting sampling and multi-scale ethnic distribution alignment. Evaluated across 65+ models and 20+ data subsets, our method achieves an average 12.7% improvement in cross-ethnic recognition accuracy and significantly reduces false positive and false negative rate disparities. It establishes a new paradigm for data balance in continuous ethnic space and provides both theoretically grounded and empirically scalable foundations for fairness-aware modeling in facial recognition.

Technology Category

Application Category

πŸ“ Abstract
Bias has been a constant in face recognition models. Over the years, researchers have looked at it from both the model and the data point of view. However, their approach to mitigation of data bias was limited and lacked insight on the real nature of the problem. Here, in this document, we propose to revise our use of ethnicity labels as a continuous variable instead of a discrete value per identity. We validate our formulation both experimentally and theoretically, showcasing that not all identities from one ethnicity contribute equally to the balance of the dataset; thus, having the same number of identities per ethnicity does not represent a balanced dataset. We further show that models trained on datasets balanced in the continuous space consistently outperform models trained on data balanced in the discrete space. We trained more than 65 different models, and created more than 20 subsets of the original datasets.
Problem

Research questions and friction points this paper is trying to address.

Addressing bias in face recognition models using continuous ethnicity labels
Challenging discrete ethnicity labels for more balanced datasets
Improving model performance with continuous ethnicity-based dataset balancing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous ethnicity labels replace discrete values
Dataset balance assessed in continuous label space
Models trained on continuous labels outperform discrete ones
πŸ”Ž Similar Papers
No similar papers found.
Pedro C. Neto
Pedro C. Neto
Artificial Intelligence Scientist at Unilabs and Invited Assistant Professor at FEUP
Artificial IntelligenceDeep LearningMachine LearningComputer VisionBiometrics
N
N. Damer
Fraunhofer Institute for Computer Graphics Research IGD, Technische Universitat Darmstadt
J
Jaime S. Cardoso
FEUP & INESC TEC
A
Ana F. Sequeira
INESC TEC