Targeted Data Fusion for Causal Survival Analysis Under Distribution Shift

📅 2025-01-30

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

This study addresses key challenges in causal survival analysis under multi-source heterogeneous distributions—including right-censoring, mixed discrete/continuous-time confounding, and cross-site distributional shifts—by proposing the first privacy-preserving federated causal survival inference framework. Methodologically, it integrates semi-parametric efficient estimators with a dynamic importance-weighting mechanism to adaptively correct source-site distribution shifts, while leveraging nonparametric machine learning models (e.g., neural survival models, gradient-boosted survival trees) to capture complex event-risk functions. Its key contributions are: (i) the first systematic solution to causal fusion of multi-source censored time-to-event data, and (ii) support for target-site-specific causal effect estimation. Evaluated on synthetic data and multi-national HIV-1 prevention clinical trials, the framework significantly improves estimation accuracy and robustness of causal effects, enabling privacy-safe, generalizable causal survival inference across diverse populations and geographic regions.

Technology Category

Application Category

📝 Abstract

Causal inference across multiple data sources offers a promising avenue to enhance the generalizability and replicability of scientific findings. However, data integration methods for time-to-event outcomes, common in biomedical research, are underdeveloped. Existing approaches focus on binary or continuous outcomes but fail to address the unique challenges of survival analysis, such as censoring and the integration of discrete and continuous time. To bridge this gap, we propose two novel methods for estimating target site-specific causal effects in multi-source settings. First, we develop a semiparametric efficient estimator for settings where individual-level data can be shared across sites. Second, we introduce a federated learning framework designed for privacy-constrained environments, which dynamically reweights source-specific contributions to account for discrepancies with the target population. Both methods leverage flexible, nonparametric machine learning models to improve robustness and efficiency. We illustrate the utility of our approaches through simulation studies and an application to multi-site randomized trials of monoclonal neutralizing antibodies for HIV-1 prevention, conducted among cisgender men and transgender persons in the United States, Brazil, Peru, and Switzerland, as well as among women in sub-Saharan Africa. Our findings underscore the potential of these methods to enable efficient, privacy-preserving causal inference for time-to-event outcomes under distribution shift.

Problem

Research questions and friction points this paper is trying to address.

Address causal survival analysis challenges under distribution shift

Develop methods for multi-source data with privacy constraints

Improve robustness in time-to-event outcome integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semiparametric efficient estimator for shared individual-level data

Federated learning framework for privacy-constrained environments

Nonparametric machine learning models for robustness and efficiency

🔎 Similar Papers

FPBoost: Fully Parametric Gradient Boosting for Survival Analysis

2024-09-20arXiv.orgCitations: 0

Authors to Follow