🤖 AI Summary
To address performance degradation in text-based person search (TBPS) under complex scenes—caused by uncertainties in detection and cross-modal matching—this paper proposes UPD-TBPS, an uncertainty-aware framework. Methodologically, it introduces (1) a novel multi-granularity uncertainty estimation mechanism to explicitly quantify confidence in both detection and cross-modal matching, and (2) a prototype semantic disentanglement architecture that hierarchically models coarse-grained cluster prototypes and fine-grained individual prototypes, thereby decoupling visual context and enabling confidence-aware matching. The framework is trained end-to-end, jointly optimizing multi-granularity textual queries, prototype mining, cross-modal re-identification, and uncertainty modeling. Extensive experiments on CUHK-SYSU-TBPS and PRW-TBPS demonstrate significant improvements in mAP and top-1 accuracy, validating that uncertainty disentanglement simultaneously enhances localization robustness and matching precision.
📝 Abstract
Text-based pedestrian search (TBPS) in full images aims to locate a target pedestrian in untrimmed images using natural language descriptions. However, in complex scenes with multiple pedestrians, existing methods are limited by uncertainties in detection and matching, leading to degraded performance. To address this, we propose UPD-TBPS, a novel framework comprising three modules: Multi-granularity Uncertainty Estimation (MUE), Prototype-based Uncertainty Decoupling (PUD), and Cross-modal Re-identification (ReID). MUE conducts multi-granularity queries to identify potential targets and assigns confidence scores to reduce early-stage uncertainty. PUD leverages visual context decoupling and prototype mining to extract features of the target pedestrian described in the query. It separates and learns pedestrian prototype representations at both the coarse-grained cluster level and the fine-grained individual level, thereby reducing matching uncertainty. ReID evaluates candidates with varying confidence levels, improving detection and retrieval accuracy. Experiments on CUHK-SYSU-TBPS and PRW-TBPS datasets validate the effectiveness of our framework.