🤖 AI Summary
This paper addresses catastrophic forgetting in compositional zero-shot learning (CZSL) when vision-language models (VLMs) continually adapt to novel attributes, objects, and their compositions. To tackle this, we propose PromptCCZSL—a novel framework featuring a synergistic mechanism between session-aware compositional prompts and session-agnostic attribute/object prompts. It incorporates multi-teacher distillation with a frozen VLM backbone, multimodal feature-fusion prompt learning, and a joint optimization strategy comprising Cosine Anchor Loss, Orthogonal Projection Loss, and Intra-Session Diversity Loss. Evaluated on UT-Zappos and C-GQA, PromptCCZSL significantly outperforms existing VLM-based and non-VLM baselines. Moreover, we introduce the first comprehensive CCZSL benchmark explicitly designed to assess both anti-forgetting capability and compositional generalization—establishing a new paradigm for continual compositional learning.
📝 Abstract
We tackle continual adaptation of vision-language models to new attributes, objects, and their compositions in Compositional Zero-Shot Learning (CZSL), while preventing forgetting of prior knowledge. Unlike classical continual learning where classes are disjoint, CCZSL is more complex as attributes and objects may reoccur across sessions while compositions remain unique. Built on a frozen VLM backbone, we propose the first Prompt-based Continual Compositional Zero-Shot Learning (PromptCCZSL) framework that retains prior knowledge through recency-weighted multi-teacher distillation. It employs session-aware compositional prompts to fuse multimodal features for new compositions, while attribute and object prompts are learned through session-agnostic fusion to maintain global semantic consistency, which is further stabilized by a Cosine Anchor Loss (CAL) to preserve prior knowledge. To enhance adaptation in the current session, an Orthogonal Projection Loss (OPL) ensures that new attribute and object embeddings remain distinct from previous ones, preventing overlap, while an Intra-Session Diversity Loss (IDL) promotes variation among current-session embeddings for richer, more discriminative representations. We also introduce a comprehensive protocol that jointly measures catastrophic forgetting and compositional generalization. Extensive experiments on UT-Zappos and C-GQA benchmarks demonstrate that PromptCCZSL achieves substantial improvements over prior VLM-based and non-VLM baselines, setting a new benchmark for CCZSL in closed-world settings.