🤖 AI Summary
In online learning, arbitrary and uncontrolled shifts in data distribution—both in magnitude and pace—hinder adaptive selection of attention span. Method: This paper proposes a black-box meta-algorithmic framework that integrates multi-resolution online learners, cross-temporal cross-validation, and wave-theory-inspired sliding historical modeling to enable the first theoretically guaranteed, fully automatic online selection of attention span. The approach requires no model retraining or white-box access, maintaining only $O(log T)$ lightweight learner instances. Contribution/Results: Evaluated on diverse real-world text and image datasets with heterogeneous data sources, the method consistently improves classification accuracy across various online learning algorithms. It achieves high computational efficiency and strong robustness to distributional shifts, offering a practical and theoretically sound solution for adaptive attention span selection in non-stationary environments.
📝 Abstract
We study the well-motivated problem of online distribution shift in which the data arrive in batches and the distribution of each batch can change arbitrarily over time. Since the shifts can be large or small, abrupt or gradual, the length of the relevant historical data to learn from may vary over time, which poses a major challenge in designing algorithms that can automatically adapt to the best ``attention span'' while remaining computationally efficient. We propose a meta-algorithm that takes any network architecture and any Online Learner (OL) algorithm as input and produces a new algorithm which provably enhances the performance of the given OL under non-stationarity. Our algorithm is efficient (it requires maintaining only $O(log(T))$ OL instances) and adaptive (it automatically chooses OL instances with the ideal ``attention'' length at every timestamp). Experiments on various real-world datasets across text and image modalities show that our method consistently improves the accuracy of user specified OL algorithms for classification tasks. Key novel algorithmic ingredients include a emph{multi-resolution instance} design inspired by wavelet theory and a cross-validation-through-time technique. Both could be of independent interest.