π€ AI Summary
This work addresses the problem of efficiently selecting the maximum among $n$ unknown values, each observed only through a single unbiased estimate. The authors propose an adaptive weighted averaging strategy grounded in statistical decision theory, which integrates online learning with batch optimization techniques. The method achieves a βno-regretβ guarantee: it is admissible, meaning its worst-case performance never falls below that of uniform random selection, while substantially outperforming baseline approaches when favorable structural assumptions hold. By bridging online and batch learning paradigms, this approach establishes new theoretical bounds for online-to-batch conversion, offering both robustness in adversarial settings and enhanced empirical performance under benign conditions.
π Abstract
We study the problem of selecting the largest among $n$ unknown values $x_1,\dots,x_n$ given only a single unbiased estimate $y_i$ for each $x_i$. We design strategies that are simultaneously admissible (not uniformly dominated by any other strategy) and also never worse than a given baseline such as uniform random selection. We provide an application to stochastic optimization, where we obtain online-to-batch conversion bounds with a desirable "no-compromise" guarantee: they are never worse than standard random iterate selection, and yet can be significantly better in benign settings.