Learning Causal Graphs at Scale: A Foundation Model Approach

📅 2025-06-23

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

DAG learning faces two fundamental challenges: super-exponential computational complexity and structural unidentifiability under limited samples. To address these, this paper introduces the first foundational model framework for DAG learning. It pretrains a unified mapping from data distributions to both causal graph structures and their parametric forms, incorporating a shared low-dimensional prior to enhance few-shot generalization and zero-shot inference. We propose Attention-DAG (ADAG), a novel architecture that integrates linear Transformers with nonlinear attention kernels, enabling efficient end-to-end causal structure learning. Evaluated on standard synthetic benchmarks, ADAG achieves significant improvements in structural recovery accuracy, supports rapid zero-shot inference, and demonstrates strong scalability and practical deployability—thereby validating its effectiveness, generalizability, and potential for real-world causal discovery applications.

Technology Category

Application Category

📝 Abstract

Due to its human-interpretability and invariance properties, Directed Acyclic Graph (DAG) has been a foundational tool across various areas of AI research, leading to significant advancements. However, DAG learning remains highly challenging, due to its super-exponential growth in computational cost and identifiability issues, particularly in small-sample regimes. To address these two challenges, in this work we leverage the recent success of linear transformers and develop a foundation model approach for discovering multiple order-consistent DAGs across tasks. In particular, we propose Attention-DAG (ADAG), a novel attention-mechanism-based architecture for learning multiple linear Structural Equation Models (SEMs). ADAG learns the mapping from observed data to both graph structure and parameters via a nonlinear attention-based kernel, enabling efficient multi-task estimation of the underlying linear SEMs. By formulating the learning process across multiple tasks as a continuous optimization problem, the pre-trained ADAG model captures the common structural properties as a shared low-dimensional prior, thereby reducing the ill-posedness of downstream DAG learning tasks in small-sample regimes. We evaluate our proposed approach on benchmark synthetic datasets and find that ADAG achieves substantial improvements in both DAG learning accuracy and zero-shot inference efficiency. To the best of our knowledge, this is the first practical approach for pre-training a foundation model specifically designed for DAG learning, representing a step toward more efficient and generalizable down-stream applications in causal discovery.

Problem

Research questions and friction points this paper is trying to address.

Addresses computational cost and identifiability in DAG learning

Proposes Attention-DAG for multi-task linear SEM estimation

Enhances DAG learning accuracy and zero-shot inference efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention-DAG architecture for multi-task DAG learning

Nonlinear attention-based kernel for efficient SEM estimation

Pre-trained model reduces ill-posedness in small-sample regimes

🔎 Similar Papers

No similar papers found.