From Observations to Causations: A GNN-based Probabilistic Prediction Framework for Causal Discovery

📅 2025-07-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Causal discovery infers causal structures among variables from observational data but faces challenges in scalability and modeling global dependencies. This paper proposes the first probabilistic causal graph learning framework based on graph neural networks (GNNs), formulating causal discovery as an end-to-end graph-structure prediction task. It jointly encodes node and edge attributes via GNNs and employs information-theoretic supervision—including mutual information and conditional entropy—to directly learn the probability distribution over the space of causal graphs. Crucially, our method is the first to explicitly model the global causal graph distribution using GNNs, enabling zero-shot cross-dataset generalization. Evaluated on multiple synthetic and real-world benchmarks, it significantly outperforms both classical and state-of-the-art non-GNN methods in accuracy and computational efficiency. This advances the practical deployment of large-scale causal structure learning.

Technology Category

Application Category

📝 Abstract

Causal discovery from observational data is challenging, especially with large datasets and complex relationships. Traditional methods often struggle with scalability and capturing global structural information. To overcome these limitations, we introduce a novel graph neural network (GNN)-based probabilistic framework that learns a probability distribution over the entire space of causal graphs, unlike methods that output a single deterministic graph. Our framework leverages a GNN that encodes both node and edge attributes into a unified graph representation, enabling the model to learn complex causal structures directly from data. The GNN model is trained on a diverse set of synthetic datasets augmented with statistical and information-theoretic measures, such as mutual information and conditional entropy, capturing both local and global data properties. We frame causal discovery as a supervised learning problem, directly predicting the entire graph structure. Our approach demonstrates superior performance, outperforming both traditional and recent non-GNN-based methods, as well as a GNN-based approach, in terms of accuracy and scalability on synthetic and real-world datasets without further training. This probabilistic framework significantly improves causal structure learning, with broad implications for decision-making and scientific discovery across various fields.

Problem

Research questions and friction points this paper is trying to address.

Challenges in causal discovery from large, complex observational data

Scalability and global structural limitations in traditional methods

Need for probabilistic prediction of entire causal graph space

Innovation

Methods, ideas, or system contributions that make the work stand out.

GNN-based probabilistic framework for causal graphs

Encodes node and edge attributes jointly

Supervised learning predicts entire graph structure

🔎 Similar Papers

No similar papers found.

Authors to Follow