Learning Conformal Explainers for Image Classifiers

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing feature attribution methods for image explanation suffer from poor robustness, uncontrolled fidelity, and reliance on ground-truth annotations for calibration—severely limiting their trustworthiness. This paper proposes the first conformal prediction-based framework for trustworthy image explanation, eliminating the need for ground-truth explanation supervision. It jointly models superpixel segmentation and FastSHAP to design four novel conformity scores that quantify consistency between explanations and model predictions, enabling direct user control over explanation fidelity. Experiments across multiple image benchmarks demonstrate: (i) significantly improved explanation stability and trustworthiness; (ii) FastSHAP’s superiority over state-of-the-art methods in both fidelity and information efficiency—measured by explanation compactness; and (iii) superpixel-level conformity assessment yielding greater discriminability and robustness than pixel-level alternatives.

Technology Category

Application Category

📝 Abstract

Feature attribution methods are widely used for explaining image-based predictions, as they provide feature-level insights that can be intuitively visualized. However, such explanations often vary in their robustness and may fail to faithfully reflect the reasoning of the underlying black-box model. To address these limitations, we propose a novel conformal prediction-based approach that enables users to directly control the fidelity of the generated explanations. The method identifies a subset of salient features that is sufficient to preserve the model's prediction, regardless of the information carried by the excluded features, and without demanding access to ground-truth explanations for calibration. Four conformity functions are proposed to quantify the extent to which explanations conform to the model's predictions. The approach is empirically evaluated using five explainers across six image datasets. The empirical results demonstrate that FastSHAP consistently outperforms the competing methods in terms of both fidelity and informational efficiency, the latter measured by the size of the explanation regions. Furthermore, the results reveal that conformity measures based on super-pixels are more effective than their pixel-wise counterparts.

Problem

Research questions and friction points this paper is trying to address.

Addressing inconsistent robustness in feature attribution explanations for image classifiers

Providing user control over explanation fidelity without ground-truth calibration

Identifying minimal salient features sufficient to preserve model predictions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Conformal prediction controls explanation fidelity

Identifies salient features preserving model predictions

Uses super-pixel conformity measures for effectiveness

🔎 Similar Papers

No similar papers found.

Authors to Follow