AI-Assisted Variance Reduction in Randomized Experiments

📅 2026-06-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the high variance often encountered in randomized experiments involving abundant unstructured data, such as text. It proposes incorporating predictions from large language models (LLMs) about human behavior as “digital twin” covariates within a standard regression adjustment framework to reduce estimation variance. The approach is provably “harmless,” automatically reverting to the unadjusted estimator when AI predictions lack signal. It also offers effective strategies for transforming discrete LLM outputs into continuous prognostic scores and for characterizing unstructured textual data. Across simulations and three empirical applications—survey experiments, email marketing A/B tests, and platform-based field experiments—the method demonstrates robust and modest efficiency gains in text-rich settings, while confirming its safety and practical utility.

📝 Abstract

Generative AI and large language models can produce realistic predictions of human behavior from rich, unstructured inputs with little to no task-specific training data. Recent work uses these ``digital twin'' predictions to supplement human responses in surveys and experiments. We study the special case of using AI-generated predictions to reduce variance in randomized experiments. We argue that doing so requires no new estimators and that researchers can simply include AI predictions as covariates in standard regression adjustment, analogous to adjusting for a prognostic score. A benefit of this approach is a ``do no harm'' property whereby the adjusted estimator reverts to the unadjusted difference in means when predictions are uninformative. Other methods, such as variants of prediction-powered inference, do not have this guarantee. We provide implementation guidance, including how to obtain continuous scores from discrete LLM outputs and how to use LLMs to featurize unstructured inputs as auxiliary covariates. We demonstrate these ideas in simulations and three empirical applications: a survey mega-study, an email marketing A/B test, and a large-scale technology platform experiment. Overall, efficiency gains are real if modest, with greater benefits in studies that contain substantial text and other unstructured data. We also confirm the do no harm property empirically. Given these gains and limited costs, we recommend adjusting for AI-generated predictions as a regular empirical practice.

Problem

Research questions and friction points this paper is trying to address.

variance reduction

randomized experiments

AI-assisted estimation

covariate adjustment

digital twins

Innovation

Methods, ideas, or system contributions that make the work stand out.

variance reduction

AI-assisted estimation

regression adjustment