🤖 AI Summary
This paper investigates the feasibility of deterministic learning in binary online classification under “apple tasting” feedback—where labels are revealed only upon predicting 1. In the realizable setting, we establish for the first time that deterministic learners achieve the same learnability as randomized ones. We construct the first universal deterministic apple tasting learner, attaining a tight optimal mistake bound of $O(sqrt{L(mathcal{H}) T log T})$, where $L(mathcal{H})$ is the Littlestone dimension of hypothesis class $mathcal{H}$. Our analysis integrates Littlestone dimension theory, deterministic expert frameworks, and combinatorial online learning techniques. In the agnostic setting—where the best hypothesis makes at most $k$ mistakes—we fully characterize the complexity landscape into three regimes: easy, hard, and unlearnable. For each regime, we provide matching upper and lower bounds for both deterministic and randomized algorithms, resolving the fundamental limits of deterministic learning under apple tasting feedback.
📝 Abstract
In binary ($0/1$) online classification with apple tasting feedback, the learner receives feedback only when predicting $1$. Besides some degenerate learning tasks, all previously known learning algorithms for this model are randomized. Consequently, prior to this work it was unknown whether deterministic apple tasting is generally feasible. In this work, we provide the first widely-applicable deterministic apple tasting learner, and show that in the realizable case, a hypothesis class is learnable if and only if it is deterministically learnable, confirming a conjecture of [Raman, Subedi, Raman, Tewari-24]. Quantitatively, we show that every class $mathcal{H}$ is learnable with mistake bound $O left(sqrt{mathtt{L}(mathcal{H}) T log T}
ight)$ (where $mathtt{L}(mathcal{H})$ is the Littlestone dimension of $mathcal{H}$), and that this is tight for some classes. We further study the agnostic case, in which the best hypothesis makes at most $k$ many mistakes, and prove a trichotomy stating that every class $mathcal{H}$ must be either easy, hard, or unlearnable. Easy classes have (both randomized and deterministic) mistake bound $Theta_{mathcal{H}}(k)$. Hard classes have randomized mistake bound $ ilde{Theta}_{mathcal{H}} left(k + sqrt{T}
ight)$, and deterministic mistake bound $ ilde{Theta}_{mathcal{H}} left(sqrt{k cdot T}
ight)$, where $T$ is the time horizon. Unlearnable classes have (both randomized and deterministic) mistake bound $Theta(T)$. Our upper bound is based on a deterministic algorithm for learning from expert advice with apple tasting feedback, a problem interesting in its own right. For this problem, we show that the optimal deterministic mistake bound is $Theta left(sqrt{T (k + log n)}
ight)$ for all $k$ and $T leq n leq 2^T$, where $n$ is the number of experts.