A Framework for Causal Concept-based Model Explanations

📅 2025-12-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of causal interpretability in black-box AI models by proposing a causal concept-driven explainable AI framework. The method employs post-hoc semantic concept extraction, constructs a causal graph between concepts and model outputs, and quantifies the effect of concept interventions on predictions via probabilistic sufficiency analysis—yielding faithful explanations with both local and global perspectives. Its key innovation lies in explicitly modeling the causal effects of concept interventions, ensuring explanations are both human-intelligible (high comprehensibility) and strictly consistent with the original model’s behavior (low fidelity loss). Experiments on CelebA demonstrate that the generated concept-based explanations exhibit clear semantics, strong readability, and classification performance highly aligned with the original model, empirically validating the framework’s effective balance between fidelity and interpretability.

Technology Category

Application Category

📝 Abstract
This work presents a conceptual framework for causal concept-based post-hoc Explainable Artificial Intelligence (XAI), based on the requirements that explanations for non-interpretable models should be understandable as well as faithful to the model being explained. Local and global explanations are generated by calculating the probability of sufficiency of concept interventions. Example explanations are presented, generated with a proof-of-concept model made to explain classifiers trained on the CelebA dataset. Understandability is demonstrated through a clear concept-based vocabulary, subject to an implicit causal interpretation. Fidelity is addressed by highlighting important framework assumptions, stressing that the context of explanation interpretation must align with the context of explanation generation.
Problem

Research questions and friction points this paper is trying to address.

Develop a causal concept-based framework for post-hoc XAI
Generate local and global explanations via concept intervention probabilities
Ensure explanations are understandable and faithful to the model
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal concept-based post-hoc XAI framework
Probability of sufficiency for concept interventions
Aligns interpretability with model fidelity
A
Anna Rodum Bjøru
Norwegian University of Science and Technology
J
Jacob Lysnaes-Larsen
Norwegian University of Science and Technology
O
Oskar Jorgensen
Norwegian University of Science and Technology
I
Inga Strumke
Norwegian University of Science and Technology
Helge Langseth
Helge Langseth
Norwegian University of Science and Technology
Bayesian networksVariational InferenceMachine LearningTime series analysis