Mitigating Bias in Concept Bottleneck Models for Fair and Interpretable Image Classification

📅 2026-03-06

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work addresses the fairness limitations of Concept Bottleneck Models (CBMs), which, despite their interpretability, are prone to leaking sensitive attributes such as gender. To mitigate this issue, the study proposes the first systematic integration of three debiasing strategies within the CBM framework: top-k concept filtering to reduce redundant information, explicit removal of biased concepts, and an adversarial debiasing mechanism. This combined approach effectively attenuates unwanted biases while preserving model interpretability. Experimental results on benchmarks such as ImSitu demonstrate that the proposed method significantly outperforms existing approaches, achieving a superior trade-off between fairness and classification performance.

Technology Category

Application Category

📝 Abstract

Ensuring fairness in image classification prevents models from perpetuating and amplifying bias. Concept bottleneck models (CBMs) map images to high-level, human-interpretable concepts before making predictions via a sparse, one-layer classifier. This structure enhances interpretability and, in theory, supports fairness by masking sensitive attribute proxies such as facial features. However, CBM concepts have been known to leak information unrelated to concept semantics and early results reveal only marginal reductions in gender bias on datasets like ImSitu. We propose three bias mitigation techniques to improve fairness in CBMs: 1. Decreasing information leakage using a top-k concept filter, 2. Removing biased concepts, and 3. Adversarial debiasing. Our results outperform prior work in terms of fairness-performance tradeoffs, indicating that our debiased CBM provides a significant step towards fair and interpretable image classification.

Problem

Research questions and friction points this paper is trying to address.

bias mitigation

concept bottleneck models

fairness

image classification

information leakage

Innovation

Methods, ideas, or system contributions that make the work stand out.

Concept Bottleneck Models

Bias Mitigation

Adversarial Debiasing