TS-Neyman: Posterior Sampling for Adaptive Stratified Estimation

📅 2026-06-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of efficiently estimating average loss or subgroup metrics in hierarchical data pools when labels or human evaluations are costly. The authors propose TS-Neyman, a novel method that integrates Thompson sampling with classical Neyman optimal allocation for the first time. By modeling variance uncertainty through an inverse-chi-squared posterior and constructing a sequential sampling strategy based on one-step marginal variance reduction, TS-Neyman preserves the theoretical optimality of Neyman allocation while achieving asymptotic optimality and almost sure convergence. Empirical results demonstrate that TS-Neyman attains near-oracle relative efficiency (within 15%) across diverse benchmarks and real-data replay settings, significantly outperforming equal allocation and plug-in greedy approaches, particularly under sparse pilot data conditions.

📝 Abstract

Many model evaluation tasks reduce to estimating an average loss, error rate, or subgroup metric on a stratified pool when each label, human rating, or simulator call is costly. The precision-optimal Neyman allocation depends on within-stratum variances, which must be learned from the same observations used for estimation. We formulate this as a sequential allocation problem and use the exact one-step marginal variance reduction as the priority index. Replacing the unknown variances by independent inverse-chi-squared posterior draws yields TS-Neyman, a Thompson-sampling rule that preserves the oracle marginal-gain structure while randomizing over variance uncertainty. For any fixed finite number of strata, we prove almost-sure convergence of the TS-Neyman allocation proportions to the Neyman target, asymptotic optimality of the variance proxy, and a central limit theorem for the resulting adaptive stratified estimator. In two five-stratum budget-scaling benchmarks, one bounded-loss benchmark and one binary model-error benchmark in the spirit of Dai et al. 2023, TS-Neyman's relative efficiency stays within 5 percent of the oracle on the bounded-loss population and within about 15 percent on the binary benchmark. In an additional CivilComments real-data replay with confidence-based strata, it stays within about 8 percent of the oracle and improves on equal allocation by roughly 7 to 14 percent in MSE across budgets, while plug-in greedy and two-stage plug-in can degrade by over an order of magnitude under sparse pilots. Common-pilot warm-start and prior-sensitivity studies show that this behavior is stable under working-model and working-prior misspecification.

Problem

Research questions and friction points this paper is trying to address.

stratified estimation

Neyman allocation

variance estimation

adaptive sampling

costly labeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Thompson sampling

adaptive stratified estimation

Neyman allocation