Incorporating External Controls for Estimating the Average Treatment Effect on the Treated with High-Dimensional Data: Retaining Double Robustness and Ensuring Double Safety

๐Ÿ“… 2025-09-24
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In high-dimensional settings, estimating the average treatment effect on the treated (ATT) using external controls often suffers from efficiency loss due to model misspecification. To address this, we propose a novel doubly robust estimator. We first establish that direct incorporation of external controls can impair estimation efficiency. Our method introduces the โ€œdoubly safeโ€ property: it achieves efficiency no worse than conventional estimators even if either the propensity score or outcome model is misspecified, and attains the semiparametric efficiency bound when both models are correctly specified. Built upon high-dimensional asymptotics and the doubly robust framework, the estimator enables valid high-dimensional confounder adjustment using large-scale historical dataโ€”e.g., electronic health records. Theoretical analysis, extensive simulations, and real-data applications consistently demonstrate that our estimator achieves an optimal trade-off between robustness and efficiency.

Technology Category

Application Category

๐Ÿ“ Abstract
Randomized controlled trials (RCTs) are widely regarded as the gold standard for causal inference in biomedical research. For instance, when estimating the average treatment effect on the treated (ATT), a doubly robust estimation procedure can be applied, requiring either the propensity score model or the control outcome model to be correctly specified. In this paper, we address scenarios where external control data, often with a much larger sample size, are available. Such data are typically easier to obtain from historical records or third-party sources. However, we find that incorporating external controls into the standard doubly robust estimator for ATT may paradoxically result in reduced efficiency compared to using the estimator without external controls. This counterintuitive outcome suggests that the naive incorporation of external controls could be detrimental to estimation efficiency. To resolve this issue, we propose a novel doubly robust estimator that guarantees higher efficiency than the standard approach without external controls, even under model misspecification. When all models are correctly specified, this estimator aligns with the standard doubly robust estimator that incorporates external controls and achieves semiparametric efficiency. The asymptotic theory developed in this work applies to high-dimensional confounder settings, which are increasingly common with the growing prevalence of electronic health record data. We demonstrate the effectiveness of our methodology through extensive simulation studies and a real-world data application.
Problem

Research questions and friction points this paper is trying to address.

Improving ATT estimation efficiency when incorporating external control data
Resolving paradoxical efficiency loss from naive external control integration
Developing doubly robust estimators for high-dimensional confounder settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel doubly robust estimator for ATT
Ensures higher efficiency with external controls
Handles high-dimensional confounder settings asymptotically
๐Ÿ”Ž Similar Papers
No similar papers found.