Effects of Distributional Biases on Gradient-Based Causal Discovery in the Bivariate Categorical Case

📅 2025-09-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies a fundamental vulnerability of gradient-based causal discovery methods in binary classification settings: systematic bias induced by data distributional asymmetries—specifically, marginal distribution asymmetry and shift asymmetry—which mislead causal learning and cause spurious preferences for particular causal factorizations. To dissect this phenomenon, the authors formally define and empirically validate both asymmetries, generate controllable synthetic data using Dirichlet priors, and conduct attribution analysis via marginal/conditional distribution modeling and interventional experiments. The key finding is that eliminating competition among alternative causal factorizations significantly enhances robustness to distributional shifts. This insight provides both theoretical grounding and practical design principles for developing gradient-based causal algorithms resilient to distributional shift.

Technology Category

Application Category

📝 Abstract
Gradient-based causal discovery shows great potential for deducing causal structure from data in an efficient and scalable way. Those approaches however can be susceptible to distributional biases in the data they are trained on. We identify two such biases: Marginal Distribution Asymmetry, where differences in entropy skew causal learning toward certain factorizations, and Marginal Distribution Shift Asymmetry, where repeated interventions cause faster shifts in some variables than in others. For the bivariate categorical setup with Dirichlet priors, we illustrate how these biases can occur even in controlled synthetic data. To examine their impact on gradient-based methods, we employ two simple models that derive causal factorizations by learning marginal or conditional data distributions - a common strategy in gradient-based causal discovery. We demonstrate how these models can be susceptible to both biases. We additionally show how the biases can be controlled. An empirical evaluation of two related, existing approaches indicates that eliminating competition between possible causal factorizations can make models robust to the presented biases.
Problem

Research questions and friction points this paper is trying to address.

Analyzing distributional biases in gradient-based causal discovery
Examining entropy and intervention effects on causal learning
Developing robust models against marginal distribution asymmetries
Innovation

Methods, ideas, or system contributions that make the work stand out.

Controls marginal and conditional distribution biases
Eliminates competition between causal factorizations
Uses gradient-based methods with Dirichlet priors
🔎 Similar Papers
No similar papers found.