🤖 AI Summary
This paper addresses the problem of computing tight bounds for partially identifiable probabilistic queries in quasi-Markov structural causal models (SCMs), where endogenous variables are observable and their distributions known, but exogenous confounders remain incompletely specified. To solve this, we propose a novel column-generation-based algorithm that reformulates the original multilinear program into a sequence of tractable auxiliary linear integer programs. Theoretically, we prove that, under single interventions, an equivalent representation of the exogenous variables admits a polynomial-size characterization—overcoming the traditional exponential-complexity barrier. Our method integrates Bayesian network modeling, causal inference principles, and optimization techniques. Experiments demonstrate that our approach significantly outperforms state-of-the-art methods in computational efficiency, bound tightness, and model parsimony.
📝 Abstract
We investigate partially identifiable queries in a class of causal models. We focus on acyclic Structural Causal Models that are quasi-Markovian (that is, each endogenous variable is connected with at most one exogenous confounder). We look into scenarios where endogenous variables are observed (and a distribution over them is known), while exogenous variables are not fully specified. This leads to a representation that is in essence a Bayesian network where the distribution of root variables is not uniquely determined. In such circumstances, it may not be possible to precisely compute a probability value of interest. We thus study the computation of tight probability bounds, a problem that has been solved by multilinear programming in general, and by linear programming when a single confounded component is intervened upon. We present a new algorithm to simplify the construction of such programs by exploiting input probabilities over endogenous variables. For scenarios with a single intervention, we apply column generation to compute a probability bound through a sequence of auxiliary linear integer programs, thus showing that a representation with polynomial cardinality for exogenous variables is possible. Experiments show column generation techniques to be superior to existing methods.