Improving RCT-Based CATE Estimation Under Covariate Mismatch via Double Calibration

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of estimating conditional average treatment effects (CATE) when covariates in randomized controlled trials (RCTs) and observational studies do not fully overlap. The authors propose MR-OSCAR, a novel method that integrates covariate imputation with double calibration: in the first stage, missing covariates in the RCT are imputed using observational data; in the second stage, predictions from the observational model are calibrated to align with the RCT covariate distribution, thereby preserving causal structure while improving CATE estimation accuracy. Theoretical analysis provides finite-sample error decomposition guarantees. Empirical results demonstrate that MR-OSCAR substantially outperforms benchmarks relying solely on shared covariates, particularly when missing covariates are highly predictable and the RCT sample size is moderate. The method has been successfully applied to the Greenlight Plus pediatric obesity trial and Vanderbilt electronic health records data.

Technology Category

Application Category

📝 Abstract
We develop estimators that improve precision of heterogeneous treatment effect estimates that allow borrowing information from observational studies when the available covariates in each data source do not perfectly match. Standard data-borrowing methods often assume perfectly matched covariates. We propose MR-OSCAR, an RCT-calibrated, two-stage estimation approach that first predicts the trial-missing variables using the observational data via imputation and then calibrates observational outcome predictions to the randomized trial, preserving the causal contrast, unlike the results for generalization, where imputation does not improve performance. Our theory gives finite-sample guarantees with a transparent error decomposition including an imputation error that shrinks as the observational mapping becomes more predictable. Simulations show that imputation almost always outperforms naively using only the shared covariates and clarifies when borrowing helps (strong predictability of the missing block, moderate trial size) and when it does not (poor predictability or dominant trial-only moderators). We motivate the approach with the Greenlight Plus trial on early childhood obesity and outline a forthcoming EHR analysis at Vanderbilt, highlighting the use of our method in common scenarios where data do not perfectly align.
Problem

Research questions and friction points this paper is trying to address.

CATE estimation
covariate mismatch
data borrowing
randomized controlled trial
observational study
Innovation

Methods, ideas, or system contributions that make the work stand out.

CATE estimation
covariate mismatch
double calibration
data borrowing
imputation
S
Samhita Pal
Department of Biostatistics, Vanderbilt University Medical Center
Jared D. Huling
Jared D. Huling
Assistant Professor of Biostatistics, School of Public Health, University of Minnesota
statisticsbiostatistics
A
Amir Asiaee
Department of Biostatistics, Vanderbilt University Medical Center