🤖 AI Summary
To address catastrophic forgetting in cross-domain continual learning, this paper proposes a domain-aware adapter architecture tailored for Vision Transformers (ViTs). It introduces lightweight adapters into ViT self-attention layers and integrates domain-specific feature gating with dynamic multi-head output routing, explicitly modeling how task ordering modulates knowledge stability. Unlike conventional single-domain continual learning paradigms, the approach enables parameter-efficient and structurally controllable cross-domain knowledge retention. Evaluated on a heterogeneous sequential benchmark comprising CIFAR-100, Flowers102, and DTD, it achieves an average accuracy gain of over 8% compared to state-of-the-art parameter-efficient fine-tuning (PEFT) methods. The results demonstrate significantly mitigated forgetting and improved generalization, validating the effectiveness of co-designing task-ordering strategies with domain-aware architectural mechanisms.
📝 Abstract
Continual learning empowers models to learn from a continuous stream of data while preserving previously acquired knowledge, effectively addressing the challenge of catastrophic forgetting. In this study, we propose a new approach that integrates adapters within the self-attention mechanisms of Vision Transformers to enhance knowledge retention when sequentially adding datasets from different domains. Unlike previous methods that continue learning with only one dataset, our approach introduces domain-specific output heads and feature gating, allowing the model to maintain high accuracy on previously learned tasks while incorporating only the essential information from multiple domains. The proposed method is compared to prominent parameter-efficient fine-tuning methods in the current state of the art. The results provide evidence that our method effectively alleviates the limitations of previous works. Furthermore, we conduct a comparative analysis using three datasets, CIFAR-100, Flowers102, and DTD, each representing a distinct domain, to investigate the impact of task order on model performance. Our findings underscore the critical role of dataset sequencing in shaping learning outcomes, demonstrating that strategic ordering can significantly improve the model's ability to adapt to evolving data distributions over time while preserving the integrity of previously learned knowledge.