🤖 AI Summary
EHR-based causal inference is often biased by unmeasured confounding. To address this, we propose a two-stage proxy variable approach: in Stage I, factor analysis extracts latent factors from observed proxies and the treatment variable, serving as robust proxies for unmeasured confounders; in Stage II, these latent factors are incorporated into the outcome model for unbiased causal effect estimation. This work is the first to systematically integrate factor analysis into the proxy variable framework—without requiring strong distributional assumptions or joint modeling—thereby enhancing both practicality and robustness. In simulations featuring non-normal errors and model misspecification, as well as on real-world EHR data (evaluating hospital admission effects among elderly patients with chest pain), our method reduces estimation bias by over 40% compared to conventional covariate adjustment, yielding more plausible and reliable causal estimates.
📝 Abstract
Electronic health records (EHR) are used to study treatment effects in clinical settings, yet unmeasured confounding remains a persistent challenge. Indirect measurements of the unmeasured confounder (proxies) offer a potential solution, but existing approaches -- such as proximal inference or full joint modeling -- can be difficult to implement. We propose a two-stage, proxy-based method that is practical, broadly applicable, and robust. In the first stage, we apply factor analysis to proxy and treatment variables, extracting information on latent factors that serve as a surrogate for the unmeasured confounder. In the second stage, we use this model to build covariates that improve causal effect estimation in a standard outcome regression model. Through simulations, we test the method's performance under assumption violations, including non-normal errors, model misspecification, and scenarios where instruments or confounders are incorrectly treated as proxies. We also apply the method to estimate the effect of hospital admission for older adults presenting to the emergency department with chest pain, a setting where standard analyses may fail to recover plausible effects. Our results show that this simplified strategy recovers more reliable estimates than conventional adjustment methods, offering applied researchers a practical tool for addressing unmeasured confounding with proxy variables.