🤖 AI Summary
This study addresses the challenge that local interpretability methods often produce seemingly plausible yet unfaithful explanations for complex tabular data. To rigorously evaluate explanation fidelity, robustness, and complexity, the authors construct a comprehensive benchmarking framework encompassing multiple models and datasets, incorporating for the first time a prediction-consistency grouping strategy. They systematically assess prominent methods—including LIME, Kernel SHAP, and feature ablation—across 32 tabular datasets. The findings reveal that explanation quality shows no significant correlation with model accuracy; instead, it is predominantly influenced by data complexity and feature distribution, particularly on samples where models consistently err. This work provides a novel perspective and empirical foundation for trustworthy evaluation in explainable AI.
📝 Abstract
Despite the wide use of explainability techniques to attempt to understand the behavior of Artificial Intelligence (AI), the generated explanations may not always be reliable. An explanation can appear plausible to humans but fail to capture the internal reasoning of a model, particularly when dealing with complex tabular data. This paper studies the trustworthiness of local explainability techniques when applied to complex tabular classification tasks, considering evaluated metrics for three main properties: faithfulness to the model's predictions, robustness to input data variations, and complexity of the explanation itself. A benchmark was performed for Local Interpretable Model-Agnostic Explanations (LIME), Kernel SHapley Additive exPlanations (SHAP), and Feature Ablation techniques, across 32 datasets and different types of machine learning models. Model performance ranges were analyzed to identify two groups: consensus-correct, which are samples that all models predicted correctly, and consensus-wrong, samples that all models predicted incorrectly. The obtained results demonstrate that that the explanations are not always correlated with a model's predictive performance. Instead, dataset complexity and feature distributions seem to be the main factors affecting explanation quality and reliability.