Improving precision of A/B experiments using trigger intensity

📅 2024-11-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In industrial A/B experiments, weak treatment effects often lead to insufficient statistical power. Existing methods leverage only binary trigger observations—i.e., whether outputs differ between groups—ignoring the magnitude of such differences, while full annotation of trigger intensity is prohibitively costly. This paper introduces “trigger intensity” into the A/B evaluation framework for the first time, proposing two estimation paradigms: omniscient (full knowledge) and sampling-based (partial knowledge). We theoretically prove that sampling bias asymptotically vanishes as sample size increases. Our method integrates trigger identification, stratified sampling, bias analysis, and Monte Carlo simulation. Validated on real-world business data, the omniscient approach reduces standard error by 85%, while the sampling-based approach achieves a 36.48% reduction—significantly improving estimation accuracy and statistical power.

Technology Category

Application Category

📝 Abstract
In industry, online randomized controlled experiment (a.k.a A/B experiment) is a standard approach to measure the impact of a causal change. These experiments have small treatment effect to reduce the potential blast radius. As a result, these experiments often lack statistical significance due to low signal-to-noise ratio. To improve the precision (or reduce standard error), we introduce the idea of trigger observations where the output of the treatment and the control model are different. We show that the evaluation with full information about trigger observations (full knowledge) improves the precision in comparison to a baseline method. However, detecting all such trigger observations is a costly affair, hence we propose a sampling based evaluation method (partial knowledge) to reduce the cost. The randomness of sampling introduces bias in the estimated outcome. We theoretically analyze this bias and show that the bias is inversely proportional to the number of observations used for sampling. We also compare the proposed evaluation methods using simulation and empirical data. In simulation, evaluation with full knowledge reduces the standard error as much as 85%. In empirical setup, evaluation with partial knowledge reduces the standard error by 36.48%.
Problem

Research questions and friction points this paper is trying to address.

Enhancing A/B experiment precision via trigger intensity
Reducing cost of trigger observation detection
Analyzing bias in sampling-based evaluation methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sampling-based evaluation method reduces cost
Bias inversely proportional to observation count
Partial knowledge cuts standard error significantly
🔎 Similar Papers
No similar papers found.