🤖 AI Summary
This study addresses key challenges in causal survival analysis under multi-source heterogeneous distributions—including right-censoring, mixed discrete/continuous-time confounding, and cross-site distributional shifts—by proposing the first privacy-preserving federated causal survival inference framework. Methodologically, it integrates semi-parametric efficient estimators with a dynamic importance-weighting mechanism to adaptively correct source-site distribution shifts, while leveraging nonparametric machine learning models (e.g., neural survival models, gradient-boosted survival trees) to capture complex event-risk functions. Its key contributions are: (i) the first systematic solution to causal fusion of multi-source censored time-to-event data, and (ii) support for target-site-specific causal effect estimation. Evaluated on synthetic data and multi-national HIV-1 prevention clinical trials, the framework significantly improves estimation accuracy and robustness of causal effects, enabling privacy-safe, generalizable causal survival inference across diverse populations and geographic regions.
📝 Abstract
Causal inference across multiple data sources offers a promising avenue to enhance the generalizability and replicability of scientific findings. However, data integration methods for time-to-event outcomes, common in biomedical research, are underdeveloped. Existing approaches focus on binary or continuous outcomes but fail to address the unique challenges of survival analysis, such as censoring and the integration of discrete and continuous time. To bridge this gap, we propose two novel methods for estimating target site-specific causal effects in multi-source settings. First, we develop a semiparametric efficient estimator for settings where individual-level data can be shared across sites. Second, we introduce a federated learning framework designed for privacy-constrained environments, which dynamically reweights source-specific contributions to account for discrepancies with the target population. Both methods leverage flexible, nonparametric machine learning models to improve robustness and efficiency. We illustrate the utility of our approaches through simulation studies and an application to multi-site randomized trials of monoclonal neutralizing antibodies for HIV-1 prevention, conducted among cisgender men and transgender persons in the United States, Brazil, Peru, and Switzerland, as well as among women in sub-Saharan Africa. Our findings underscore the potential of these methods to enable efficient, privacy-preserving causal inference for time-to-event outcomes under distribution shift.