🤖 AI Summary
This paper addresses the persistent challenge of sensitive attribute bias in computer vision models—bias that existing debiasing methods often mask rather than eliminate. We propose a novel XAI-based evaluation framework grounded in saliency maps, which systematically quantifies the spatial and semantic distance between model decision regions and protected attributes. Crucially, it distinguishes “superficial fairness” (bias concealment) from “causal fairness” (genuine independence from sensitive attributes). Empirical evaluation across multiple benchmark datasets and state-of-the-art debiased models demonstrates that effective debiasing consistently redirects saliency away from sensitive features; moreover, artifact removal techniques exhibit transferable fairness improvements. The proposed metrics provide interpretable, verifiable, quantitative diagnostics for fairness assessment, thereby enhancing the trustworthiness and ethical robustness of AI systems. (149 words)
📝 Abstract
The widespread adoption of machine learning systems has raised critical concerns about fairness and bias, making mitigating harmful biases essential for AI development. In this paper, we investigate the relationship between fairness improvement and the removal of harmful biases in neural networks applied to computer vision tasks. First, we introduce a set of novel XAI-based metrics that analyze saliency maps to assess shifts in a model's decision-making process. Then, we demonstrate that successful debiasing methods systematically redirect model focus away from protected attributes. Additionally, we show that techniques originally developed for artifact removal can be effectively repurposed for fairness. These findings underscore the importance of ensuring that models are fair for the right reasons, contributing to the development of more ethical and trustworthy AI systems.