🤖 AI Summary
This study addresses the challenge of efficiently identifying practically meaningful treatment effects under resource constraints and concurrent experimentation, where conventional resource allocation strategies—optimized to minimize mean squared error (MSE)—often prove suboptimal. The authors propose a novel framework that shifts the objective toward minimizing the worst-case Type II error (i.e., miss rate) by leveraging statistical power. They develop a variance inflation mechanism with a correction factor, tailored to scenarios where outcome standard deviations are either known or estimated from pilot data, and formulate optimization models under three distinct risk criteria. A fully data-driven Surrogate-S algorithm is introduced to implement the approach without requiring ground-truth variance information. Theoretical analysis demonstrates the potential inefficiency of MSE-oriented strategies in detection tasks, while numerical experiments show that the proposed method achieves near-optimal performance using only pilot-based variance estimates.
📝 Abstract
Randomized experiments (often known as "A/B tests") are widely used to evaluate product and service innovations. We study how to allocate limited experimentation resources across M concurrent experiments in an experiment-rich regime. Existing work on allocation has predominantly focused on minimizing the worst-case mean squared error (MSE) of estimated treatment effects, which favors experiments with larger (and typically unknown) outcome variance. While appropriate for controlling estimation accuracy, this objective does not directly capture a common managerial priority in screening stages: detecting practically meaningful treatment effects with high probability.
Motivated by this, we consider the objective of minimizing the worst-case Type II error across all experiments. When the standard deviations are known, we characterize the power-optimal allocation and show that MSE-based allocations can be highly inefficient for detection, even though the two objectives align asymptotically. When the standard deviations are unknown and must be learned from pilot data, we show that a naive plug-in approach, treating pilot standard deviations as truth, can suffer substantial power loss.
We propose inflating pilot estimates via correction factors and develop three optimization-based frameworks for selecting them, each reflecting a different risk criterion with distinct managerial implications. Although the resulting stochastic programs are computationally challenging at scale, we derive tractable surrogate reformulations inspired by robust optimization and establish favorable theoretical properties. We further propose Surrogate-S, a fully data-dependent and implementable procedure that computes correction factors using only pilot variance estimates and achieves near-oracle performance in numerical experiments.