Vision-Language Attribute Disentanglement and Reinforcement for Lifelong Person Re-Identification

📅 2026-03-20

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Existing lifelong person re-identification methods fail to effectively leverage fine-grained attribute knowledge embedded in vision-language models, thereby limiting their cross-domain transferability and resistance to catastrophic forgetting. This work proposes VLADR, the first approach to integrate vision-language models into lifelong person re-identification by introducing multi-granularity textual attribute disentanglement and cross-modal, cross-domain attribute alignment mechanisms. These components enable historical knowledge to better support learning on new tasks. The proposed method substantially enhances both generalization and memory retention capabilities, outperforming state-of-the-art approaches by 1.9%–2.2% in anti-forgetting performance and by 2.1%–2.5% in generalization accuracy.

Technology Category

Application Category

📝 Abstract

Lifelong person re-identification (LReID) aims to learn from varying domains to obtain a unified person retrieval model. Existing LReID approaches typically focus on learning from scratch or a visual classification-pretrained model, while the Vision-Language Model (VLM) has shown generalizable knowledge in a variety of tasks. Although existing methods can be directly adapted to the VLM, since they only consider global-aware learning, the fine-grained attribute knowledge is underleveraged, leading to limited acquisition and anti-forgetting capacity. To address this problem, we introduce a novel VLM-driven LReID approach named Vision-Language Attribute Disentanglement and Reinforcement (VLADR). Our key idea is to explicitly model the universally shared human attributes to improve inter-domain knowledge transfer, thereby effectively utilizing historical knowledge to reinforce new knowledge learning and alleviate forgetting. Specifically, VLADR includes a Multi-grain Text Attribute Disentanglement mechanism that mines the global and diverse local text attributes of an image. Then, an Inter-domain Cross-modal Attribute Reinforcement scheme is developed, which introduces cross-modal attribute alignment to guide visual attribute extraction and adopts inter-domain attribute alignment to achieve fine-grained knowledge transfer. Experimental results demonstrate that our VLADR outperforms the state-of-the-art methods by 1.9\%-2.2\% and 2.1\%-2.5\% on anti-forgetting and generalization capacity. Our source code is available at https://github.com/zhoujiahuan1991/CVPR2026-VLADR

Problem

Research questions and friction points this paper is trying to address.

Lifelong Person Re-Identification

Vision-Language Model

Attribute Disentanglement

Knowledge Forgetting

Fine-grained Attributes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Model

Attribute Disentanglement

Lifelong Learning