🤖 AI Summary
This paper identifies a fundamental trade-off between generalization performance and counterfactual explainability in supervised learning: poorer generalization (i.e., stronger overfitting) facilitates easier generation of counterfactual examples.
Method: We propose a novel metric—ε-effective counterfactual probability (ε-VCP)—grounded in the geometric properties of decision boundaries and local input perturbations. We theoretically prove that ε-VCP monotonically increases with overfitting severity, establishing it as a reliable proxy for generalization degradation.
Contribution/Results: This work establishes, for the first time, a formal theoretical connection between generalization and counterfactual explainability. Empirical validation across multiple benchmark datasets confirms that ε-VCP consistently tracks declining generalization performance. As such, ε-VCP serves as a quantifiable, interpretable diagnostic tool for both model robustness assessment and explainability evaluation.
📝 Abstract
In this work, we investigate the relationship between model generalization and counterfactual explainability in supervised learning. We introduce the notion of $varepsilon$-valid counterfactual probability ($varepsilon$-VCP) -- the probability of finding perturbations of a data point within its $varepsilon$-neighborhood that result in a label change. We provide a theoretical analysis of $varepsilon$-VCP in relation to the geometry of the model's decision boundary, showing that $varepsilon$-VCP tends to increase with model overfitting. Our findings establish a rigorous connection between poor generalization and the ease of counterfactual generation, revealing an inherent trade-off between generalization and counterfactual explainability. Empirical results validate our theory, suggesting $varepsilon$-VCP as a practical proxy for quantitatively characterizing overfitting.