🤖 AI Summary
This paper addresses the challenge of unifying causal modeling across abstraction levels. Methodologically, it introduces the first category-theoretic framework for causal abstraction, formalizing causal abstraction as natural transformations between Markov functors—thereby unifying diverse existing abstractions (e.g., τ-consistency, coarse-grained intervention models) within a single categorical setting. It employs string diagrams to represent structural evolution under interventions and generalizes the do-calculus to acyclic directed mixed graphs (ADMGs) with latent variables. Theoretically, it recovers and generalizes key consistency conditions and rigorously proves that high-level do-calculus results faithfully reflect underlying causal mechanisms. Practically, the framework enables mechanism interpretability analysis, including circuit decomposition and sparse autoencoding. Overall, it significantly enhances the rigor and expressive power of cross-level causal inference.
📝 Abstract
We present a categorical framework for relating causal models that represent the same system at different levels of abstraction. We define a causal abstraction as natural transformations between appropriate Markov functors, which concisely consolidate desirable properties a causal abstraction should exhibit. Our approach unifies and generalizes previously considered causal abstractions, and we obtain categorical proofs and generalizations of existing results on causal abstractions. Using string diagrammatical tools, we can explicitly describe the graphs that serve as consistent abstractions of a low-level graph under interventions. We discuss how methods from mechanistic interpretability, such as circuit analysis and sparse autoencoders, fit within our categorical framework. We also show how applying do-calculus on a high-level graphical abstraction of an acyclic-directed mixed graph (ADMG), when unobserved confounders are present, gives valid results on the low-level graph, thus generalizing an earlier statement by Anand et al. (2023). We argue that our framework is more suitable for modeling causal abstractions compared to existing categorical frameworks. Finally, we discuss how notions such as $τ$-consistency and constructive $τ$-abstractions can be recovered with our framework.