🤖 AI Summary
This work addresses the challenge of identifying causal structures and causal orders across multiple directed acyclic graphs (DAGs) under heterogeneous data settings. The authors propose a Bayesian scoring framework based on a shared causal order, which models high-dimensional Gaussian DAGs and leverages data heterogeneity to substantially improve the accuracy of causal order estimation. Theoretically, they show that under optimal conditions, heterogeneity reduces the set of admissible causal orders to at most two permutations. Building on this insight, they design a random-to-random (R2R) neighborhood structure to enable efficient Metropolis–Hastings posterior sampling. Experimental results demonstrate superior performance on synthetic data and successful application to single-nucleus RNA sequencing data from major depressive disorder patients, confirming the method’s practical utility.
📝 Abstract
We propose a joint order-based scoring framework for causal structure learning of directed acyclic graph (DAG) models under heterogeneous data settings. We show that leveraging heterogeneity improves the accuracy of causal ordering estimation. In the most favorable case, the causal ordering is identifiable up to two permutations. Building on this framework, we propose an order-based Bayesian method for Gaussian DAG models and establish its theoretical properties in the high-dimensional regime. For posterior inference over the space of orderings, we introduce a random-to-random (R2R) proposal neighborhood for the Metropolis-Hastings algorithm, which is theoretically motivated and exhibits efficient mixing behavior. Simulation studies confirm the strong empirical performance of the proposed method, and an application to single-nucleus RNA sequencing data from major depressive disorder demonstrates practical utility.