Scalable Counterfactual Risk Estimation for Rare Events in Longitudinal Data

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This study addresses the challenges of estimating causal effects of time-varying interventions on rare survival outcomes in large-scale longitudinal observational studies, where high computational costs and severe class imbalance often hinder reliable inference. The authors propose a subsampling and inverse probability reweighting framework tailored for longitudinal survival data, which integrates seamlessly with existing causal estimators—such as g-formula–based ICE—while preserving estimator consistency and substantially reducing computational burden. This approach represents the first application of a subsampling strategy that jointly optimizes computational efficiency and statistical consistency in the context of causal inference for longitudinal rare events, effectively mitigating model instability induced by outcome imbalance. Simulations and an empirical analysis using electronic health records to assess the impact of social-behavioral factors on suicide risk demonstrate that the method markedly improves computational efficiency while enhancing both the stability and accuracy of causal estimates.

📝 Abstract

Estimating the causal effect of time-varying treatments on survival outcomes in large observational studies is computationally demanding, particularly when outcomes are rare. While g-formula-based methods such as the iterative conditional expectation (ICE) estimator provide a principled framework for longitudinal causal inference, they become computationally expensive, especially when bootstrap-based variance estimation is required. In addition, outcome rarity at each time point induces severe class imbalance, leading to instability and convergence issues in logistic regression and related models. To address these challenges, we propose a principled subsampling and reweighting strategy for longitudinal survival data that can be applied to a range of existing causal effect estimators in this setting, including the ICE estimator. The proposed method substantially reduces computational burden while preserving consistency and improving estimation stability in rare-outcome settings. We evaluate the method through simulations and validate it using a large-scale EHR cohort study on social and behavioral determinants of health (SBDH) and suicide risk, demonstrating its effectiveness for modeling rare outcomes in longitudinal data.

Problem

Research questions and friction points this paper is trying to address.

rare events

longitudinal data

causal effect estimation

class imbalance

computational scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

subsampling

reweighting

rare events