π€ AI Summary
Existing unsupervised concept-based explanation methods (U-CBEMs) focus solely on concept presence while neglecting their spatial distribution, leading to inaccurate faithfulness estimation. This work addresses posterior concept explanation for deep neural networks by proposing a spatially aware surrogate faithfulness (SF) evaluation framework and an optimal faithfulness (OF) optimization paradigmβthe first to explicitly incorporate spatial structure into both faithfulness metrics and optimization objectives. Our approach integrates surrogate model evaluation, spatially aware concept activation modeling, and gradient-driven concept optimization search. Extensive experiments across multiple benchmarks demonstrate over 30% improvement in explanation faithfulness (with statistically significant error reduction) and strong concept generalizability: our method exhibits markedly superior robustness on out-of-distribution data and adversarial examples compared to state-of-the-art baselines.
π Abstract
Post-hoc, unsupervised concept-based explanation methods (U-CBEMs) are a promising tool for generating semantic explanations of the decision-making processes in deep neural networks, having applications in both model improvement and understanding. It is vital that the explanation is accurate, or faithful, to the model, yet we identify several limitations of prior faithfulness metrics that inhibit an accurate evaluation; most notably, prior metrics involve only the set of concepts present, ignoring how they may be spatially distributed. We address these limitations with Surrogate Faithfulness (SF), an evaluation method that introduces a spatially-aware surrogate and two novel faithfulness metrics. Using SF, we produce Optimally Faithful (OF) explanations, where concepts are found that maximize faithfulness. Our experiments show that (1) adding spatial-awareness to prior U-CBEMs increases faithfulness in all cases; (2) OF produces significantly more faithful explanations than prior U-CBEMs (30% or higher improvement in error); (3) OF's learned concepts generalize well to out-of-domain data and are more robust to adversarial examples, where prior U-CBEMs struggle.