DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval

๐Ÿ“… 2025-03-06
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the high computational cost and overfitting risks of full-model fine-tuning in Text-based Person Retrieval (TPR), as well as the insufficient fine-grained representation capability of existing Parameter-Efficient Transfer Learning (PETL) methods, this paper proposes the Domain-Aware Mixture-of-Experts (DAMoE) adapter architecture to efficiently inject pedestrian-domain knowledge into CLIP. Our key contributions are: (1) a novel sparse mixture-of-adapters parallel structure coupled with a domain-aware router, enabling expert specialization and dynamic input routing; and (2) learnable domain-aware prompts integrated with an enhanced gating mechanism to strengthen fine-grained feature extraction. Extensive experiments demonstrate state-of-the-art performance across multiple benchmarks, with over 80% reduction in trainable parameters. DAMoE significantly mitigates overfitting and alleviates routing imbalanceโ€”key limitations of prior MoE-based PETL approaches.

Technology Category

Application Category

๐Ÿ“ Abstract
Text-based person retrieval (TPR) has gained significant attention as a fine-grained and challenging task that closely aligns with practical applications. Tailoring CLIP to person domain is now a emerging research topic due to the abundant knowledge of vision-language pretraining, but challenges still remain during fine-tuning: (i) Previous full-model fine-tuning in TPR is computationally expensive and prone to overfitting.(ii) Existing parameter-efficient transfer learning (PETL) for TPR lacks of fine-grained feature extraction. To address these issues, we propose Domain-Aware Mixture-of-Adapters (DM-Adapter), which unifies Mixture-of-Experts (MOE) and PETL to enhance fine-grained feature representations while maintaining efficiency. Specifically, Sparse Mixture-of-Adapters is designed in parallel to MLP layers in both vision and language branches, where different experts specialize in distinct aspects of person knowledge to handle features more finely. To promote the router to exploit domain information effectively and alleviate the routing imbalance, Domain-Aware Router is then developed by building a novel gating function and injecting learnable domain-aware prompts. Extensive experiments show that our DM-Adapter achieves state-of-the-art performance, outperforming previous methods by a significant margin.
Problem

Research questions and friction points this paper is trying to address.

Efficient fine-tuning for text-based person retrieval
Enhancing fine-grained feature extraction in TPR
Addressing overfitting and computational cost in CLIP adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-Aware Mixture-of-Adapters enhances fine-grained feature extraction.
Sparse Mixture-of-Adapters integrates with MLP layers for efficiency.
Domain-Aware Router optimizes routing with learnable domain-aware prompts.