🤖 AI Summary
In online continual learning (CL), existing methods suffer from incomparable computational (FLOPs) and memory (bytes) budgets due to heterogeneous single-pass training constraints and varying replay buffer sizes; moreover, implicit overheads—such as logit caching and model duplication—are frequently overlooked. This work proposes the first fair evaluation framework jointly constrained by total FLOPs and memory capacity. Methodologically: (1) we introduce a gradient-sensitivity-driven adaptive layer freezing mechanism to eliminate redundant computation; (2) we propose a frequency-weighted replay sampling strategy to enhance knowledge reuse efficiency. Evaluated on CIFAR-10/100, CLEAR-10/100, and ImageNet-1K, our approach consistently surpasses state-of-the-art methods under identical total resource budgets, achieving simultaneous gains in classification accuracy and inference efficiency.
📝 Abstract
The majority of online continual learning (CL) advocates single-epoch training and imposes restrictions on the size of replay memory. However, single-epoch training would incur a different amount of computations per CL algorithm, and the additional storage cost to store logit or model in addition to replay memory is largely ignored in calculating the storage budget. Arguing different computational and storage budgets hinder fair comparison among CL algorithms in practice, we propose to use floating point operations (FLOPs) and total memory size in Byte as a metric for computational and memory budgets, respectively, to compare and develop CL algorithms in the same 'total resource budget.' To improve a CL method in a limited total budget, we propose adaptive layer freezing that does not update the layers for less informative batches to reduce computational costs with a negligible loss of accuracy. In addition, we propose a memory retrieval method that allows the model to learn the same amount of knowledge as using random retrieval in fewer iterations. Empirical validations on the CIFAR-10/100, CLEAR-10/100, and ImageNet-1K datasets demonstrate that the proposed approach outperforms the state-of-the-art methods within the same total budget