🤖 AI Summary
Existing feature attribution methods for image explanation suffer from poor robustness, uncontrolled fidelity, and reliance on ground-truth annotations for calibration—severely limiting their trustworthiness. This paper proposes the first conformal prediction-based framework for trustworthy image explanation, eliminating the need for ground-truth explanation supervision. It jointly models superpixel segmentation and FastSHAP to design four novel conformity scores that quantify consistency between explanations and model predictions, enabling direct user control over explanation fidelity. Experiments across multiple image benchmarks demonstrate: (i) significantly improved explanation stability and trustworthiness; (ii) FastSHAP’s superiority over state-of-the-art methods in both fidelity and information efficiency—measured by explanation compactness; and (iii) superpixel-level conformity assessment yielding greater discriminability and robustness than pixel-level alternatives.
📝 Abstract
Feature attribution methods are widely used for explaining image-based predictions, as they provide feature-level insights that can be intuitively visualized. However, such explanations often vary in their robustness and may fail to faithfully reflect the reasoning of the underlying black-box model. To address these limitations, we propose a novel conformal prediction-based approach that enables users to directly control the fidelity of the generated explanations. The method identifies a subset of salient features that is sufficient to preserve the model's prediction, regardless of the information carried by the excluded features, and without demanding access to ground-truth explanations for calibration. Four conformity functions are proposed to quantify the extent to which explanations conform to the model's predictions. The approach is empirically evaluated using five explainers across six image datasets. The empirical results demonstrate that FastSHAP consistently outperforms the competing methods in terms of both fidelity and informational efficiency, the latter measured by the size of the explanation regions. Furthermore, the results reveal that conformity measures based on super-pixels are more effective than their pixel-wise counterparts.