🤖 AI Summary
This work addresses the lack of causal interpretability in black-box AI models by proposing a causal concept-driven explainable AI framework. The method employs post-hoc semantic concept extraction, constructs a causal graph between concepts and model outputs, and quantifies the effect of concept interventions on predictions via probabilistic sufficiency analysis—yielding faithful explanations with both local and global perspectives. Its key innovation lies in explicitly modeling the causal effects of concept interventions, ensuring explanations are both human-intelligible (high comprehensibility) and strictly consistent with the original model’s behavior (low fidelity loss). Experiments on CelebA demonstrate that the generated concept-based explanations exhibit clear semantics, strong readability, and classification performance highly aligned with the original model, empirically validating the framework’s effective balance between fidelity and interpretability.
📝 Abstract
This work presents a conceptual framework for causal concept-based post-hoc Explainable Artificial Intelligence (XAI), based on the requirements that explanations for non-interpretable models should be understandable as well as faithful to the model being explained. Local and global explanations are generated by calculating the probability of sufficiency of concept interventions. Example explanations are presented, generated with a proof-of-concept model made to explain classifiers trained on the CelebA dataset. Understandability is demonstrated through a clear concept-based vocabulary, subject to an implicit causal interpretation. Fidelity is addressed by highlighting important framework assumptions, stressing that the context of explanation interpretation must align with the context of explanation generation.