Two-Action Apple Tasting with Switching Costs

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

266K/year

🤖 AI Summary

This work investigates the minimax regret bounds for the two-action apple-tasting problem with unit switching cost against an oblivious adversary. The setting is modeled as an online choice between a revealing action—yielding perfect feedback but zero reward—and a blind action—providing stochastic reward without feedback. By integrating game-theoretic and online learning techniques to jointly account for the feedback graph structure and switching costs, the study refutes the previously conjectured Ω(T^{2/3}) lower bound and establishes that the minimax expected regret scales as Θ(√T). Specifically, it provides tight bounds: (1/(2√3))√T ≤ R_T^* ≤ 2√3 √T, thereby resolving a key gap in the theory of feedback graphs with switching costs.

📝 Abstract

We study the two-action apple-tasting problem with switching costs against an oblivious adversary. In an equivalent normalized formulation, at each round the learner chooses between a revealing action and a blind action: the revealing action gives reward $0$ and reveals the hidden value $x_t\in[-1,1]$ of the blind action; the blind action gives reward $x_t$ but reveals nothing. The learner pays one unit whenever they switches actions, and regret is measured against the best fixed action in hindsight. General feedback-graph algorithms with switching costs give $\widetilde O(T^{2/3})$ regret guarantees for this problem. The two-action apple-tasting graph was the natural candidate for the missing $Ω(T^{2/3})$ obstruction in the switching-cost classification: such a lower bound would have transferred to a large family of still-unclassified feedback graphs. We prove that this obstruction is not there: the oblivious minimax expected regret for this problem satisfies \[ \frac{1}{2\sqrt3}\cdot\sqrt T \le R_T^\star \le 2\sqrt{3}\cdot \sqrt{T}. \]

Problem

Research questions and friction points this paper is trying to address.

apple-tasting

switching costs

regret minimization

oblivious adversary

feedback graphs

Innovation

Methods, ideas, or system contributions that make the work stand out.

apple-tasting

switching costs

minimax regret