A Fast Kernel-based Conditional Independence test with Application to Causal Discovery

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
KCI testing is a pivotal nonparametric conditional independence test in causal discovery, yet its O(n³) computational complexity severely hinders scalability to large-scale datasets. To address this, we propose FastKCI—the first scalable, parallelized, and efficient KCI testing framework. Our method decouples and coordinates the testing procedure via (1) a mixture-of-experts architecture with importance-weighted aggregation, and (2) a novel integration of Gaussian mixture-based sample grouping, local kernel independence tests, and embarrassingly parallel Gaussian process inference—thereby breaking the cubic complexity barrier for the first time. Extensive experiments demonstrate that FastKCI retains statistical power comparable to the original KCI on both synthetic and real-world benchmarks, while achieving up to数十-fold speedups and enabling real-time causal structure learning on datasets with tens of thousands of samples.

Technology Category

Application Category

📝 Abstract
Kernel-based conditional independence (KCI) testing is a powerful nonparametric method commonly employed in causal discovery tasks. Despite its flexibility and statistical reliability, cubic computational complexity limits its application to large datasets. To address this computational bottleneck, we propose extit{FastKCI}, a scalable and parallelizable kernel-based conditional independence test that utilizes a mixture-of-experts approach inspired by embarrassingly parallel inference techniques for Gaussian processes. By partitioning the dataset based on a Gaussian mixture model over the conditioning variables, FastKCI conducts local KCI tests in parallel, aggregating the results using an importance-weighted sampling scheme. Experiments on synthetic datasets and benchmarks on real-world production data validate that FastKCI maintains the statistical power of the original KCI test while achieving substantial computational speedups. FastKCI thus represents a practical and efficient solution for conditional independence testing in causal inference on large-scale data.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational complexity of kernel-based conditional independence tests
Enabling scalable causal discovery for large datasets
Maintaining statistical power while improving computational efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

FastKCI uses mixture-of-experts for parallel testing
Partitions data via Gaussian mixture model
Aggregates results with importance-weighted sampling
🔎 Similar Papers
No similar papers found.