🤖 AI Summary
This paper addresses the challenge of causal inference across heterogeneous sites when only control-group data—not treatment-group observations—are available at the target site. We propose a distributed causal inference framework grounded in optimal transport, modeling inter-site heterogeneity as pushforward mappings between probability distributions. Leveraging complete experimental data from source sites and control-group data from the target site, our method synthesizes the counterfactual treatment-group distribution at the target site. The approach integrates optimal transport theory, distribution alignment, and probabilistic measure mapping to enable holistic distributional transfer of treatment effects. We validate the method on multiple synthetic benchmarks and real-world patient-derived xenograft data, demonstrating accurate recovery of the full treatment-effect distribution at the target site. The estimator is statistically consistent and asymptotically convergent. Our work significantly extends synthetic-control methodologies to distribution-level causal inference, broadening their applicability in heterogeneous multi-site settings.
📝 Abstract
We propose a novel framework for synthesizing counterfactual treatment group data in a target site by integrating full treatment and control group data from a source site with control group data from the target. Departing from conventional average treatment effect estimation, our approach adopts a distributional causal inference perspective by modeling treatment and control as distinct probability measures on the source and target sites. We formalize the cross-site heterogeneity (effect modification) as a push-forward transformation that maps the joint feature-outcome distribution from the source to the target site. This transformation is learned by aligning the control group distributions between sites using an Optimal Transport-based procedure, and subsequently applied to the source treatment group to generate the synthetic target treatment distribution. Under general regularity conditions, we establish theoretical guarantees for the consistency and asymptotic convergence of the synthetic treatment group data to the true target distribution. Simulation studies across multiple data-generating scenarios and a real-world application to patient-derived xenograft data demonstrate that our framework robustly recovers the full distributional properties of treatment effects.