Uncertainty-Aware Prototype Semantic Decoupling for Text-Based Person Search in Full Images

📅 2025-05-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address performance degradation in text-based person search (TBPS) under complex scenes—caused by uncertainties in detection and cross-modal matching—this paper proposes UPD-TBPS, an uncertainty-aware framework. Methodologically, it introduces (1) a novel multi-granularity uncertainty estimation mechanism to explicitly quantify confidence in both detection and cross-modal matching, and (2) a prototype semantic disentanglement architecture that hierarchically models coarse-grained cluster prototypes and fine-grained individual prototypes, thereby decoupling visual context and enabling confidence-aware matching. The framework is trained end-to-end, jointly optimizing multi-granularity textual queries, prototype mining, cross-modal re-identification, and uncertainty modeling. Extensive experiments on CUHK-SYSU-TBPS and PRW-TBPS demonstrate significant improvements in mAP and top-1 accuracy, validating that uncertainty disentanglement simultaneously enhances localization robustness and matching precision.

Technology Category

Application Category

📝 Abstract

Text-based pedestrian search (TBPS) in full images aims to locate a target pedestrian in untrimmed images using natural language descriptions. However, in complex scenes with multiple pedestrians, existing methods are limited by uncertainties in detection and matching, leading to degraded performance. To address this, we propose UPD-TBPS, a novel framework comprising three modules: Multi-granularity Uncertainty Estimation (MUE), Prototype-based Uncertainty Decoupling (PUD), and Cross-modal Re-identification (ReID). MUE conducts multi-granularity queries to identify potential targets and assigns confidence scores to reduce early-stage uncertainty. PUD leverages visual context decoupling and prototype mining to extract features of the target pedestrian described in the query. It separates and learns pedestrian prototype representations at both the coarse-grained cluster level and the fine-grained individual level, thereby reducing matching uncertainty. ReID evaluates candidates with varying confidence levels, improving detection and retrieval accuracy. Experiments on CUHK-SYSU-TBPS and PRW-TBPS datasets validate the effectiveness of our framework.

Problem

Research questions and friction points this paper is trying to address.

Locate target pedestrians in untrimmed images using text descriptions

Reduce uncertainties in detection and matching in complex scenes

Improve accuracy via multi-granularity uncertainty estimation and decoupling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-granularity Uncertainty Estimation for target confidence

Prototype-based Uncertainty Decoupling for feature extraction

Cross-modal Re-identification for improved accuracy

🔎 Similar Papers

No similar papers found.

Authors to Follow