🤖 AI Summary
This study addresses the challenges of estimating causal effects of time-varying interventions on rare survival outcomes in large-scale longitudinal observational studies, where high computational costs and severe class imbalance often hinder reliable inference. The authors propose a subsampling and inverse probability reweighting framework tailored for longitudinal survival data, which integrates seamlessly with existing causal estimators—such as g-formula–based ICE—while preserving estimator consistency and substantially reducing computational burden. This approach represents the first application of a subsampling strategy that jointly optimizes computational efficiency and statistical consistency in the context of causal inference for longitudinal rare events, effectively mitigating model instability induced by outcome imbalance. Simulations and an empirical analysis using electronic health records to assess the impact of social-behavioral factors on suicide risk demonstrate that the method markedly improves computational efficiency while enhancing both the stability and accuracy of causal estimates.
📝 Abstract
Estimating the causal effect of time-varying treatments on survival outcomes in large observational studies is computationally demanding, particularly when outcomes are rare. While g-formula-based methods such as the iterative conditional expectation (ICE) estimator provide a principled framework for longitudinal causal inference, they become computationally expensive, especially when bootstrap-based variance estimation is required. In addition, outcome rarity at each time point induces severe class imbalance, leading to instability and convergence issues in logistic regression and related models. To address these challenges, we propose a principled subsampling and reweighting strategy for longitudinal survival data that can be applied to a range of existing causal effect estimators in this setting, including the ICE estimator. The proposed method substantially reduces computational burden while preserving consistency and improving estimation stability in rare-outcome settings. We evaluate the method through simulations and validate it using a large-scale EHR cohort study on social and behavioral determinants of health (SBDH) and suicide risk, demonstrating its effectiveness for modeling rare outcomes in longitudinal data.