🤖 AI Summary
This work addresses the challenges of multi-label class-incremental learning, including the high computational cost of full-parameter fine-tuning, substantial memory overhead from replay buffers, and issues of feature confusion and domain shift. To overcome these limitations, the authors propose a replay-free, parameter-efficient framework that uniquely integrates class-specific prompts with lightweight continuous adapters. A Prompt-to-Label module decouples multi-label representations and incorporates linguistic priors to achieve semantic–visual alignment, while the Continuous Adapter mitigates domain discrepancies between the pre-trained model and downstream tasks, enhancing model plasticity. Evaluated on MS-COCO and PASCAL VOC under both standard and challenging incremental settings, the method achieves state-of-the-art performance with only a minimal number of trainable parameters, demonstrating exceptional efficacy and strong generalization capability.
📝 Abstract
Multi-label Class-Incremental Learning aims to continuously recognize novel categories in complex scenes where multiple objects co-occur. However, existing approaches often incur high computational costs due to full-parameter fine-tuning and substantial storage overhead from memory buffers, or they struggle to address feature confusion and domain discrepancies adequately. To overcome these limitations, we introduce P2L-CA, a parameter-efficient framework that integrates a Prompt-to-Label module with a Continuous Adapter module. The P2L module leverages class-specific prompts to disentangle multi-label representations while incorporating linguistic priors to enforce stable semantic-visual alignment. Meanwhile, the CA module employs lightweight adapters to mitigate domain gaps between pre-trained models and downstream tasks, thereby enhancing model plasticity. Extensive experiments across standard and challenging MLCIL settings on MS-COCO and PASCAL VOC show that P2L-CA not only achieves substantial improvements over state-of-the-art methods but also demonstrates strong generalization in CIL scenarios, all while requiring minimal trainable parameters and eliminating the need for memory buffers.