🤖 AI Summary
This study addresses the challenge of coarsened data arising in two-stage sampling when only a subset of variables is observed in the second stage. Under the assumption that the outcome variable is fully observed, the authors propose a class of novel estimators based on targeted maximum likelihood estimation (TMLE). This approach provides a unified framework for modeling the second-stage sampling mechanism, encompassing generalized calibration estimation, inverse probability of censoring weighted TMLE (IPCW-TMLE), and their extensions. The proposed estimators possess double robustness and achieve higher efficiency, with theoretical analysis demonstrating that they attain the semiparametric efficiency bound asymptotically—matching the best-known performance in the literature—and thereby substantially improving the precision of parameter estimation.
📝 Abstract
In a typical two-phase design, a random sample is drawn from the target population in phase 1, during which only a subset of variables is collected. In phase 2, a subsample of the phase-1 cohort is selected, and additional variables are measured. This setting induces a coarsened data structure on the data from the second phase. We assume coarsening at random, that is, the phase-2 sampling mechanism depends only on variables fully observed. We review existing estimators, including the generalized raking estimator and the inverse probability of censoring weighted targeted maximum likelihood estimation (IPCW-TMLE) along with its extensions that also target the phase-2 sampling mechanism to improve efficiency. We further introduce a new class of estimators constructed within the TMLE framework that are asymptotically equivalent.