🤖 AI Summary
This work addresses the explainability and robustness of contrastive explanations in image classification—specifically, why a model prefers one class over others. We propose a novel contrastive explanation method grounded in concept-class relevance and instance embedding similarity. By fine-tuning the model, we extract human-interpretable concepts and their quantitative relevance scores to classes, enabling generation of concise, stable, and semantically meaningful contrastive explanations. Key findings show that high-relevance concepts substantially reduce explanation complexity and improve cross-sample consistency; moreover, explanations driven by high-relevance concepts exhibit superior robustness under image perturbations such as rotation and noise. To our knowledge, this is the first work to systematically establish quantitative relationships between concept relevance and both explanation complexity and robustness, thereby providing a verifiable paradigm for generating trustworthy AI explanations. (149 words)
📝 Abstract
Understanding why a classification model prefers one class over another for an input instance is the challenge of contrastive explanation. This work implements concept-based contrastive explanations for image classification by leveraging the similarity of instance embeddings and relevance of human-understandable concepts used by a fine-tuned deep learning model. Our approach extracts concepts with their relevance score, computes contrasts for similar instances, and evaluates the resulting contrastive explanations based on explanation complexity. Robustness is tested for different image augmentations. Two research questions are addressed: (1) whether explanation complexity varies across different relevance ranges, and (2) whether explanation complexity remains consistent under image augmentations such as rotation and noise. The results confirm that for our experiments higher concept relevance leads to shorter, less complex explanations, while lower relevance results in longer, more diffuse explanations. Additionally, explanations show varying degrees of robustness. The discussion of these findings offers insights into the potential of building more interpretable and robust AI systems.