🤖 AI Summary
When machine learning models exhibit similar predictive performance, their post hoc explanations—such as feature importance scores—can differ substantially, complicating the assessment of explanation reliability. This work proposes the first model-agnostic evaluation framework that operates without ground-truth explanation labels. By formalizing five classes of transformational relationships between model behavior and feature attributions, the framework quantifies explanation faithfulness under unsupervised conditions. Integrating metamorphic testing, SHAP, LIME, and Rashomon set analysis, the method demonstrates empirical validity on two tabular regression datasets. The approach provides a practical tool for selecting models that are not only highly accurate but also yield trustworthy explanations.
📝 Abstract
Multiple machine learning models can achieve near-equivalent predictive performance on the same task, yet provide divergent feature-based explanations. This is called the Rashomon effect of (explainable) machine learning, and it raises the question of which explanations, if any, are trustworthy. We propose a framework based on metamorphic testing that assesses explanation faithfulness without requiring ground-truth labels by exploring attributed feature importance from post-hoc explanation methods. Five metamorphic relations formalize expected consistency properties between model behavior and feature attributions. We apply this general framework to two tabular regression datasets and two post-hoc explainers (SHAP and LIME) to demonstrate the approach. The framework offers a practical, model-agnostic tool for selecting accurate models with reliable and trustworthy explanations.