🤖 AI Summary
Measurement error in exposure variables commonly biases causal effect estimation in observational studies; existing methods either rely on strong parametric assumptions or lack design flexibility and generalizability. To address this, we propose a two-stage, doubly robust estimation framework leveraging control variables—the first systematic integration of control-variable methodology into causal inference under measurement error. Our approach accommodates diverse two-phase sampling designs (e.g., validation subsample plus full cohort) and requires only weak identification assumptions, while ensuring model flexibility, double robustness, and computational feasibility. Simulation studies demonstrate substantial improvements in finite-sample performance over mainstream correction methods. Applied to Vanderbilt’s HIV electronic health record data, our method yields more accurate and stable causal effect estimates.
📝 Abstract
Exposure measurement error is a ubiquitous but often overlooked challenge in causal inference with observational data. Existing methods accounting for exposure measurement error largely rely on restrictive parametric assumptions, while emerging data-adaptive estimation approaches allow for less restrictive assumptions but at the cost of flexibility, as they are typically tailored towards rigidly-defined statistical quantities. There remains a critical need for assumption-lean estimation methods that are both flexible and possess desirable theoretical properties across a variety of study designs. In this paper, we introduce a general framework for estimation of causal quantities in the presence of exposure measurement error, adapted from the control variates approach of Yang and Ding (2019). Our method can be implemented in various two-phase sampling study designs, where one obtains gold-standard exposure measurements for a small subset of the full study sample, called the validation data. The control variates framework leverages both the error-prone and error-free exposure measurements by augmenting an initial consistent estimator from the validation data with a variance reduction term formed from the full data. We show that our method inherits double-robustness properties under standard causal assumptions. Simulation studies show that our approach performs favorably compared to leading methods under various two-phase sampling schemes. We illustrate our method with observational electronic health record data on HIV outcomes from the Vanderbilt Comprehensive Care Clinic.