🤖 AI Summary
In large-scale online experiments, using the same data for both treatment selection and effect estimation induces “winner’s curse,” causing upward bias in conventional mean-difference estimators and miscalibrated confidence intervals. To address this, we propose a Bayesian mixture shrinkage method that introduces experiment-specific local shrinkage factors, jointly leveraging global priors and local adaptivity while enabling efficient posterior inference without numerical integration. By employing hierarchical modeling and analytical approximation techniques, our approach substantially mitigates selection bias and exhibits strong robustness to prior misspecification. Simulation studies and industrial-scale empirical evaluations demonstrate that the method significantly outperforms existing benchmarks in both estimation accuracy and uncertainty calibration—achieving well-calibrated credible intervals and unbiased effect estimates—while supporting scalable deployment across thousands of concurrent experiments.
📝 Abstract
A'Winner's Curse'arises in large-scale online experimentation platforms when the same experiments are used to both select treatments and evaluate their effects. In these settings, classical difference-in-means estimators of treatment effects are upwardly biased and conventional confidence intervals are rendered invalid. The bias scales with the magnitude of sampling variability and the selection threshold, and inversely with the treatment's true effect size. We propose a new Bayesian approach that incorporates experiment-specific'local shrinkage'factors that mitigate sensitivity to the choice of prior and improve robustness to assumption violations. We demonstrate how the associated posterior distribution can be estimated without numerical integration techniques, making it a practical choice for at-scale deployment. Through simulation, we evaluate the performance of our approach under various scenarios and find that it performs well even when assumptions about the sampling and selection processes are violated. In an empirical evaluation, our approach demonstrated superior performance over alternative methods, providing more accurate estimates with well-calibrated uncertainty quantification.