🤖 AI Summary
This work addresses the challenges of representation drift and catastrophic forgetting in exemplar-free class-incremental learning, where historical data cannot be stored. To this end, the authors propose BiCyc, a novel method that introduces, for the first time, a bidirectional projection alignment mechanism with stop-gradient gating and a cycle-consistency loss. These components jointly optimize the mapping between old and new feature spaces, enabling co-evolution of transfer and representation. Theoretical analysis demonstrates that this design contracts the singular spectrum in the whitened space and reduces perturbations in classification logit outputs. Empirical results show that BiCyc significantly lowers forgetting rates on standard EFCIL benchmarks and achieves superior performance under both from-scratch and pretrained fine-grained settings.
📝 Abstract
Continual learning (CL) seeks models that acquire new skills without erasing prior knowledge. In exemplar-free class-incremental learning (EFCIL), this challenge is amplified because past data cannot be stored, making representation drift for old classes particularly harmful. Prototype-based EFCIL is attractive for its efficiency, yet prototypes drift as the embedding space evolves; therefore, projection-based drift compensation has become a popular remedy. We show, however, that existing one-directional projections introduce systematic bias: they either retroactively distort the current feature geometry or align past classes only locally, leaving cycle inconsistencies that accumulate across tasks. We introduce BiCyc, a bidirectional projector alignment approach with a cycle-consistency objective. BiCyc jointly optimizes two maps, old-to-new and new-to-old, with stop-gradient gating so that transport and representation co-evolve. Analytically, we show that the cycle loss contracts the singular spectrum toward unity in whitened space, and that improved transport of class means and covariances yields smaller perturbations of classification log-odds, preserving old-class decisions and mitigating catastrophic forgetting. Empirically, across standard EFCIL benchmarks, BiCyc substantially reduces forgetting and improves accuracy in from-scratch settings, while remaining competitive in the pretrained fine-grained regime.