🤖 AI Summary
Existing face recognition methods often exhibit unstable performance under variations in age, pose, and occlusion, and typically rely on a fixed set of facial attributes for auxiliary supervision, overlooking the heterogeneous contributions of these attributes to identity discrimination. This work proposes an attribute-aware multi-task learning framework that explicitly groups attributes based on their relevance to identity, decoupling identity-relevant from identity-irrelevant attributes. By doing so, the model is guided to focus on identity-critical regions while suppressing irrelevant features. The approach not only leverages attribute learning as a diagnostic tool for shortcut learning but also significantly enhances the discriminability of learned embeddings on standard face verification benchmarks, thereby validating the effectiveness of selectively exploiting identity-relevant attributes and actively suppressing non-informative ones.
📝 Abstract
Despite recent advances in face recognition, robust performance remains challenging under large variations in age, pose, and occlusion. A common strategy to address these issues is to guide representation learning with auxiliary supervision from facial attributes, encouraging the visual encoder to focus on identity-relevant regions. However, existing approaches typically rely on heterogeneous and fixed sets of attributes, implicitly assuming equal relevance across attributes. This assumption is suboptimal, as different attributes exhibit varying discriminative power for identity recognition, and some may even introduce harmful biases. In this paper, we propose an attribute-aware face recognition architecture that supervises the learning of facial embeddings using identity class labels, identity-relevant facial attributes, and non-identity-related attributes. Facial attributes are organized into interpretable groups, making it possible to decompose and analyze their individual contributions in a human-understandable manner. Experiments on standard face verification benchmarks demonstrate that joint learning of identity and facial attributes improves the discriminability of face embeddings with two major conclusions: (i) using identity-relevant subsets of facial attributes consistently outperforms supervision with a broader attribute set, and (ii) explicitly forcing embeddings to unlearn non-identity-related attributes yields further performance gains compared to leaving such attributes unsupervised. Additionally, our method serves as a diagnostic tool for assessing the trustworthiness of face recognition encoders by allowing for the measurement of accuracy gains with suppression of non-identity-relevant attributes, with such gains suggesting shortcut learning from redundant attributes associated with each identity.