🤖 AI Summary
CLIP-based concept bottleneck models (CBMs) suffer from concept hallucination in zero-shot concept extraction, leading to erroneous judgments of concept existence and undermining explanation reliability. To address this, we propose CHILI—a Concept Hallucination-Insensitive Local Interpretability method—that enables pixel-level concept localization from image-level CLIP features via a local interpretability-guided embedding disentanglement mechanism. CHILI first isolates local features semantically relevant to the target concept and then synthesizes high-fidelity attention maps without requiring additional annotations or model fine-tuning. Experiments demonstrate that CHILI significantly reduces concept misclassification rates and produces attribution maps with superior fidelity and interpretability compared to existing zero-shot CBM approaches. By mitigating hallucination while preserving zero-shot capability, CHILI establishes a new paradigm for trustworthy, vision-language-model-based eXplainable AI (XAI).
📝 Abstract
This paper addresses explainable AI (XAI) through the lens of Concept Bottleneck Models (CBMs) that do not require explicit concept annotations, relying instead on concepts extracted using CLIP in a zero-shot manner. We show that CLIP, which is central in these techniques, is prone to concept hallucination, incorrectly predicting the presence or absence of concepts within an image in scenarios used in numerous CBMs, hence undermining the faithfulness of explanations. To mitigate this issue, we introduce Concept Hallucination Inhibition via Localized Interpretability (CHILI), a technique that disentangles image embeddings and localizes pixels corresponding to target concepts. Furthermore, our approach supports the generation of saliency-based explanations that are more interpretable.