Enhancing Concept Localization in CLIP-based Concept Bottleneck Models

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

CLIP-based concept bottleneck models (CBMs) suffer from concept hallucination in zero-shot concept extraction, leading to erroneous judgments of concept existence and undermining explanation reliability. To address this, we propose CHILI—a Concept Hallucination-Insensitive Local Interpretability method—that enables pixel-level concept localization from image-level CLIP features via a local interpretability-guided embedding disentanglement mechanism. CHILI first isolates local features semantically relevant to the target concept and then synthesizes high-fidelity attention maps without requiring additional annotations or model fine-tuning. Experiments demonstrate that CHILI significantly reduces concept misclassification rates and produces attribution maps with superior fidelity and interpretability compared to existing zero-shot CBM approaches. By mitigating hallucination while preserving zero-shot capability, CHILI establishes a new paradigm for trustworthy, vision-language-model-based eXplainable AI (XAI).

Technology Category

Application Category

📝 Abstract

This paper addresses explainable AI (XAI) through the lens of Concept Bottleneck Models (CBMs) that do not require explicit concept annotations, relying instead on concepts extracted using CLIP in a zero-shot manner. We show that CLIP, which is central in these techniques, is prone to concept hallucination, incorrectly predicting the presence or absence of concepts within an image in scenarios used in numerous CBMs, hence undermining the faithfulness of explanations. To mitigate this issue, we introduce Concept Hallucination Inhibition via Localized Interpretability (CHILI), a technique that disentangles image embeddings and localizes pixels corresponding to target concepts. Furthermore, our approach supports the generation of saliency-based explanations that are more interpretable.

Problem

Research questions and friction points this paper is trying to address.

Mitigate concept hallucination in CLIP-based models

Enhance concept localization without explicit annotations

Generate more interpretable saliency-based explanations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces CHILI to inhibit concept hallucination

Disentangles image embeddings for concept localization

Generates saliency-based explanations for interpretability

🔎 Similar Papers

No similar papers found.