Towards Spatially-Aware and Optimally Faithful Concept-Based Explanations

πŸ“… 2025-04-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing unsupervised concept-based explanation methods (U-CBEMs) focus solely on concept presence while neglecting their spatial distribution, leading to inaccurate faithfulness estimation. This work addresses posterior concept explanation for deep neural networks by proposing a spatially aware surrogate faithfulness (SF) evaluation framework and an optimal faithfulness (OF) optimization paradigmβ€”the first to explicitly incorporate spatial structure into both faithfulness metrics and optimization objectives. Our approach integrates surrogate model evaluation, spatially aware concept activation modeling, and gradient-driven concept optimization search. Extensive experiments across multiple benchmarks demonstrate over 30% improvement in explanation faithfulness (with statistically significant error reduction) and strong concept generalizability: our method exhibits markedly superior robustness on out-of-distribution data and adversarial examples compared to state-of-the-art baselines.

Technology Category

Application Category

πŸ“ Abstract
Post-hoc, unsupervised concept-based explanation methods (U-CBEMs) are a promising tool for generating semantic explanations of the decision-making processes in deep neural networks, having applications in both model improvement and understanding. It is vital that the explanation is accurate, or faithful, to the model, yet we identify several limitations of prior faithfulness metrics that inhibit an accurate evaluation; most notably, prior metrics involve only the set of concepts present, ignoring how they may be spatially distributed. We address these limitations with Surrogate Faithfulness (SF), an evaluation method that introduces a spatially-aware surrogate and two novel faithfulness metrics. Using SF, we produce Optimally Faithful (OF) explanations, where concepts are found that maximize faithfulness. Our experiments show that (1) adding spatial-awareness to prior U-CBEMs increases faithfulness in all cases; (2) OF produces significantly more faithful explanations than prior U-CBEMs (30% or higher improvement in error); (3) OF's learned concepts generalize well to out-of-domain data and are more robust to adversarial examples, where prior U-CBEMs struggle.
Problem

Research questions and friction points this paper is trying to address.

Improving faithfulness of concept-based explanations in neural networks
Addressing spatial-awareness gaps in prior faithfulness metrics
Enhancing robustness and generalization of learned concepts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces spatially-aware surrogate for evaluation
Maximizes faithfulness with Optimally Faithful explanations
Improves robustness to adversarial examples significantly
πŸ”Ž Similar Papers
No similar papers found.