🤖 AI Summary
This work addresses the problem of efficiently learning linear classifiers in an online setting where Massart noise and concept drift coexist. The authors propose the first algorithm that simultaneously offers theoretical guarantees and computational efficiency, integrating techniques from online learning, robust statistical estimation, and low-degree polynomial testing, while leveraging the structural properties of margin-separable halfspaces. The algorithm achieves a prediction error bound of η + Õ(Δ^{1/3}/γ), improving upon existing methods in the realizable case. Furthermore, the authors establish that an error scaling as Δ^{1/3} is nearly optimal from a computational standpoint, thereby revealing a fundamental trade-off between information-theoretic limits and computational feasibility.
📝 Abstract
We study the problem of learning a drifting concept in the presence of Massart noise. In this framework, an online learner has access to a history of independent samples whose labels are noisy versions of a target concept that may change from round to round. The goal is to output, in each round, a hypothesis with small prediction error. We study the complexity of this learning problem for the fundamental class of margin-separable linear classifiers (halfspaces). On the positive side, we give a computationally efficient learner achieving error $η+ \tilde O(Δ^{1/3}/γ)$, where $η$ upper bounds the Massart noise rate, $Δ$ is the drift rate, and $γ$ is the margin. Interestingly, in the realizable setting, an adaptation of our techniques yields an efficient learner with an improved error rate over prior work. On the lower-bound side, we provide formal evidence of an information-computation tradeoff, strongly suggesting that our algorithm's performance is essentially optimal. Specifically, while the information-theoretically optimal error scales with $Δ^{1/2}$, we prove that $Δ^{1/3}$-scaling is unavoidable for low-degree polynomial tests, even in the special case of random classification noise.