Mitigating Bias in Concept Bottleneck Models for Fair and Interpretable Image Classification

📅 2026-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the fairness limitations of Concept Bottleneck Models (CBMs), which, despite their interpretability, are prone to leaking sensitive attributes such as gender. To mitigate this issue, the study proposes the first systematic integration of three debiasing strategies within the CBM framework: top-k concept filtering to reduce redundant information, explicit removal of biased concepts, and an adversarial debiasing mechanism. This combined approach effectively attenuates unwanted biases while preserving model interpretability. Experimental results on benchmarks such as ImSitu demonstrate that the proposed method significantly outperforms existing approaches, achieving a superior trade-off between fairness and classification performance.

Technology Category

Application Category

📝 Abstract
Ensuring fairness in image classification prevents models from perpetuating and amplifying bias. Concept bottleneck models (CBMs) map images to high-level, human-interpretable concepts before making predictions via a sparse, one-layer classifier. This structure enhances interpretability and, in theory, supports fairness by masking sensitive attribute proxies such as facial features. However, CBM concepts have been known to leak information unrelated to concept semantics and early results reveal only marginal reductions in gender bias on datasets like ImSitu. We propose three bias mitigation techniques to improve fairness in CBMs: 1. Decreasing information leakage using a top-k concept filter, 2. Removing biased concepts, and 3. Adversarial debiasing. Our results outperform prior work in terms of fairness-performance tradeoffs, indicating that our debiased CBM provides a significant step towards fair and interpretable image classification.
Problem

Research questions and friction points this paper is trying to address.

bias mitigation
concept bottleneck models
fairness
image classification
information leakage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Concept Bottleneck Models
Bias Mitigation
Adversarial Debiasing
Fairness
Interpretable AI
🔎 Similar Papers
No similar papers found.
S
Schrasing Tong
Massachusetts Institute of Technology, United States
A
Antoine Salaun
Massachusetts Institute of Technology, United States
V
Vincent Yuan
Massachusetts Institute of Technology, United States
A
Annabel Adeyeri
Massachusetts Institute of Technology, United States
Lalana Kagal
Lalana Kagal
Massachusetts Institute of Technology
artificial intelligenceknowledge representationprivacycomputer systems