Towards Spatially-Aware and Optimally Faithful Concept-Based Explanations

📅 2025-04-15

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Existing unsupervised concept-based explanation methods (U-CBEMs) focus solely on concept presence while neglecting their spatial distribution, leading to inaccurate faithfulness estimation. This work addresses posterior concept explanation for deep neural networks by proposing a spatially aware surrogate faithfulness (SF) evaluation framework and an optimal faithfulness (OF) optimization paradigm—the first to explicitly incorporate spatial structure into both faithfulness metrics and optimization objectives. Our approach integrates surrogate model evaluation, spatially aware concept activation modeling, and gradient-driven concept optimization search. Extensive experiments across multiple benchmarks demonstrate over 30% improvement in explanation faithfulness (with statistically significant error reduction) and strong concept generalizability: our method exhibits markedly superior robustness on out-of-distribution data and adversarial examples compared to state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract

Post-hoc, unsupervised concept-based explanation methods (U-CBEMs) are a promising tool for generating semantic explanations of the decision-making processes in deep neural networks, having applications in both model improvement and understanding. It is vital that the explanation is accurate, or faithful, to the model, yet we identify several limitations of prior faithfulness metrics that inhibit an accurate evaluation; most notably, prior metrics involve only the set of concepts present, ignoring how they may be spatially distributed. We address these limitations with Surrogate Faithfulness (SF), an evaluation method that introduces a spatially-aware surrogate and two novel faithfulness metrics. Using SF, we produce Optimally Faithful (OF) explanations, where concepts are found that maximize faithfulness. Our experiments show that (1) adding spatial-awareness to prior U-CBEMs increases faithfulness in all cases; (2) OF produces significantly more faithful explanations than prior U-CBEMs (30% or higher improvement in error); (3) OF's learned concepts generalize well to out-of-domain data and are more robust to adversarial examples, where prior U-CBEMs struggle.

Problem

Research questions and friction points this paper is trying to address.

Improving faithfulness of concept-based explanations in neural networks

Addressing spatial-awareness gaps in prior faithfulness metrics

Enhancing robustness and generalization of learned concepts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces spatially-aware surrogate for evaluation

Maximizes faithfulness with Optimally Faithful explanations

Improves robustness to adversarial examples significantly

🔎 Similar Papers

No similar papers found.