🤖 AI Summary
This paper addresses causal direction identification between two variables from purely observational data. We propose a novel causal discovery method grounded in the algorithmic Markov condition, which—uniquely—models the variational Bayesian neural network (VBNN) learning process directly as an estimator of encoding length under causal factorization, thereby integrating the minimum description length principle with the strong representational capacity of neural networks. Unlike conventional Gaussian process-based approaches, our framework overcomes their computational complexity and expressiveness limitations by jointly optimizing variational inference, parameter compression, and causal structure. Extensive experiments on both synthetic and real-world benchmarks demonstrate that our method significantly outperforms state-of-the-art complexity-driven and structural causal model regression approaches, advancing bivariate causal discovery to a new performance frontier.
📝 Abstract
Telling apart the cause and effect between two random variables with purely observational data is a challenging problem that finds applications in various scientific disciplines. A key principle utilized in this task is the algorithmic Markov condition, which postulates that the joint distribution, when factorized according to the causal direction, yields a more succinct codelength compared to the anti-causal direction. Previous approaches approximate these codelengths by relying on simple functions or Gaussian processes (GPs) with easily evaluable complexity, compromising between model fitness and computational complexity. To overcome these limitations, we propose leveraging the variational Bayesian learning of neural networks as an interpretation of the codelengths. Consequently, we can enhance the model fitness while promoting the succinctness of the codelengths, while avoiding the significant computational complexity of the GP-based approaches. Extensive experiments on both synthetic and real-world benchmarks in cause-effect identification demonstrate the effectiveness of our proposed method, surpassing the overall performance of related complexity-based and structural causal model regression-based approaches.