🤖 AI Summary
To address weak generalization in pedestrian attribute recognition (PAR) caused by insufficient samples of rare attributes, this paper proposes a data-centric synthetic augmentation method. It introduces text-driven diffusion models to PAR for the first time, generating semantically consistent and attribute-controllable pedestrian images via prompt engineering. A prompt-guided label-aware loss reweighting strategy is designed to dynamically amplify supervision signals for rare attributes. Furthermore, a synthetic data fusion mechanism enables end-to-end training without modifying the underlying model architecture. Extensive experiments on benchmark datasets—including PA-100K and RAPv2—demonstrate significant improvements in rare-attribute recognition accuracy (+5.2% mAP) and zero-shot generalization capability. The method exhibits strong effectiveness, robustness, and cross-dataset scalability, validating its practical utility for real-world PAR applications.
📝 Abstract
Pedestrian Attribute Recognition (PAR) is a challenging task as models are required to generalize across numerous attributes in real-world data. Traditional approaches focus on complex methods, yet recognition performance is often constrained by training dataset limitations, particularly the under-representation of certain attributes. In this paper, we propose a data-centric approach to improve PAR by synthetic data augmentation guided by textual descriptions. First, we define a protocol to identify weakly recognized attributes across multiple datasets. Second, we propose a prompt-driven pipeline that leverages diffusion models to generate synthetic pedestrian images while preserving the consistency of PAR datasets. Finally, we derive a strategy to seamlessly incorporate synthetic samples into training data, which considers prompt-based annotation rules and modifies the loss function. Results on popular PAR datasets demonstrate that our approach not only boosts recognition of underrepresented attributes but also improves overall model performance beyond the targeted attributes. Notably, this approach strengthens zero-shot generalization without requiring architectural changes of the model, presenting an efficient and scalable solution to improve the recognition of attributes of pedestrians in the real world.