🤖 AI Summary
This work addresses classifier interpretability by generating sparse, manifold-aligned, and semantically plausible counterfactual explanations. Existing methods face key bottlenecks: non-convex objective functions, poorly controllable regularization, and weak generalization across diverse classifiers. To overcome these, we propose the first framework integrating Accelerated Proximal Gradient (APG) into non-convex counterfactual optimization. Our approach supports non-smooth ℓₚ sparsity regularization for 0 ≤ p < 1, jointly incorporates differentiable manifold regularization, and enforces box constraints to preserve feature feasibility. The unified formulation is compatible with various classifiers and plausibility metrics. Experiments on real-world datasets demonstrate that our method efficiently generates high-quality counterfactuals—achieving greater sparsity, closer proximity to the original instance, strict adherence to feature bounds, and improved alignment with the underlying data manifold—outperforming prior approaches in both fidelity and interpretability.
📝 Abstract
We study the problem of finding optimal sparse, manifold-aligned counterfactual explanations for classifiers. Canonically, this can be formulated as an optimization problem with multiple non-convex components, including classifier loss functions and manifold alignment (or emph{plausibility}) metrics. The added complexity of enforcing emph{sparsity}, or shorter explanations, complicates the problem further. Existing methods often focus on specific models and plausibility measures, relying on convex $ell_1$ regularizers to enforce sparsity. In this paper, we tackle the canonical formulation using the accelerated proximal gradient (APG) method, a simple yet efficient first-order procedure capable of handling smooth non-convex objectives and non-smooth $ell_p$ (where $0 leq p<1$) regularizers. This enables our approach to seamlessly incorporate various classifiers and plausibility measures while producing sparser solutions. Our algorithm only requires differentiable data-manifold regularizers and supports box constraints for bounded feature ranges, ensuring the generated counterfactuals remain emph{actionable}. Finally, experiments on real-world datasets demonstrate that our approach effectively produces sparse, manifold-aligned counterfactual explanations while maintaining proximity to the factual data and computational efficiency.