🤖 AI Summary
This work addresses the fairness limitations of Concept Bottleneck Models (CBMs), which, despite their interpretability, are prone to leaking sensitive attributes such as gender. To mitigate this issue, the study proposes the first systematic integration of three debiasing strategies within the CBM framework: top-k concept filtering to reduce redundant information, explicit removal of biased concepts, and an adversarial debiasing mechanism. This combined approach effectively attenuates unwanted biases while preserving model interpretability. Experimental results on benchmarks such as ImSitu demonstrate that the proposed method significantly outperforms existing approaches, achieving a superior trade-off between fairness and classification performance.
📝 Abstract
Ensuring fairness in image classification prevents models from perpetuating and amplifying bias. Concept bottleneck models (CBMs) map images to high-level, human-interpretable concepts before making predictions via a sparse, one-layer classifier. This structure enhances interpretability and, in theory, supports fairness by masking sensitive attribute proxies such as facial features. However, CBM concepts have been known to leak information unrelated to concept semantics and early results reveal only marginal reductions in gender bias on datasets like ImSitu. We propose three bias mitigation techniques to improve fairness in CBMs: 1. Decreasing information leakage using a top-k concept filter, 2. Removing biased concepts, and 3. Adversarial debiasing. Our results outperform prior work in terms of fairness-performance tradeoffs, indicating that our debiased CBM provides a significant step towards fair and interpretable image classification.