🤖 AI Summary
High-dimensional causal structure learning is challenged by an exponentially growing search space and high computational complexity. This work proposes VISTA, a novel framework that decomposes the global problem into local subgraphs via Markov blankets, aggregates results through exponentially decaying weighted voting, and employs an adaptive threshold to prune unreliable edges. Acyclicity is guaranteed by integrating a feedback arc set (FAS) algorithm. VISTA is the first model-agnostic, fully parallelizable, and modular causal discovery method that requires no structural priors, while offering both finite-sample error bounds and asymptotic consistency guarantees. Extensive experiments demonstrate that VISTA significantly outperforms existing baselines across diverse synthetic and real-world datasets, achieving notable advances in both accuracy and computational efficiency.
📝 Abstract
Learning causal structures from observational data remains a fundamental yet computationally intensive task, particularly in high-dimensional settings where existing methods face challenges such as the super-exponential growth of the search space and increasing computational demands. To address this, we introduce VISTA (Voting-based Integration of Subgraph Topologies for Acyclicity), a modular framework that decomposes the global causal structure learning problem into local subgraphs based on Markov Blankets. The global integration is achieved through a weighted voting mechanism that penalizes low-support edges via exponential decay, filters unreliable ones with an adaptive threshold, and ensures acyclicity using a Feedback Arc Set (FAS) algorithm. The framework is model-agnostic, imposing no assumptions on the inductive biases of base learners, is compatible with arbitrary data settings without requiring specific structural forms, and fully supports parallelization. We also theoretically establish finite-sample error bounds for VISTA, and prove its asymptotic consistency under mild conditions. Extensive experiments on both synthetic and real datasets consistently demonstrate the effectiveness of VISTA, yielding notable improvements in both accuracy and efficiency over a wide range of base learners.