🤖 AI Summary
Full-graph causal discovery in large-scale systems suffers from high computational complexity and heterogeneous intervention costs. Method: We propose a target-oriented local causal discovery paradigm that directly identifies the causal parent set of a target variable—bypassing full causal graph reconstruction—and design a linear-time neural network model integrating local inference strategies with causal identifiability modeling, trained under supervision on synthetic data. Contribution/Results: Our approach breaks the bottleneck of traditional global reconstruction, achieving strong out-of-distribution generalization across diverse structural and generative mechanisms. Experiments on *E. coli* and human K562 gene regulatory networks demonstrate significant improvements over full-graph baselines, scalability to thousands of variables, and support for cost-aware, differential intervention optimization. The code is publicly available.
📝 Abstract
We propose a novel machine learning approach for inferring causal variables of a target variable from observations. Our focus is on directly inferring a set of causal factors without requiring full causal graph reconstruction, which is computationally challenging in large-scale systems. The identified causal set consists of all potential regulators of the target variable under experimental settings, enabling efficient regulation when intervention costs and feasibility vary across variables. To achieve this, we train a neural network using supervised learning on simulated data to infer causality. By employing a local-inference strategy, our approach scales with linear complexity in the number of variables, efficiently scaling up to thousands of variables. Empirical results demonstrate superior performance in identifying causal relationships within large-scale gene regulatory networks, outperforming existing methods that emphasize full-graph discovery. We validate our model's generalization capability across out-of-distribution graph structures and generating mechanisms, including gene regulatory networks of E. coli and the human K562 cell line. Implementation codes are available at https://github.com/snu-mllab/Targeted-Cause-Discovery.