Uncertainty-Aware Prototype Semantic Decoupling for Text-Based Person Search in Full Images

📅 2025-05-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address performance degradation in text-based person search (TBPS) under complex scenes—caused by uncertainties in detection and cross-modal matching—this paper proposes UPD-TBPS, an uncertainty-aware framework. Methodologically, it introduces (1) a novel multi-granularity uncertainty estimation mechanism to explicitly quantify confidence in both detection and cross-modal matching, and (2) a prototype semantic disentanglement architecture that hierarchically models coarse-grained cluster prototypes and fine-grained individual prototypes, thereby decoupling visual context and enabling confidence-aware matching. The framework is trained end-to-end, jointly optimizing multi-granularity textual queries, prototype mining, cross-modal re-identification, and uncertainty modeling. Extensive experiments on CUHK-SYSU-TBPS and PRW-TBPS demonstrate significant improvements in mAP and top-1 accuracy, validating that uncertainty disentanglement simultaneously enhances localization robustness and matching precision.

Technology Category

Application Category

📝 Abstract
Text-based pedestrian search (TBPS) in full images aims to locate a target pedestrian in untrimmed images using natural language descriptions. However, in complex scenes with multiple pedestrians, existing methods are limited by uncertainties in detection and matching, leading to degraded performance. To address this, we propose UPD-TBPS, a novel framework comprising three modules: Multi-granularity Uncertainty Estimation (MUE), Prototype-based Uncertainty Decoupling (PUD), and Cross-modal Re-identification (ReID). MUE conducts multi-granularity queries to identify potential targets and assigns confidence scores to reduce early-stage uncertainty. PUD leverages visual context decoupling and prototype mining to extract features of the target pedestrian described in the query. It separates and learns pedestrian prototype representations at both the coarse-grained cluster level and the fine-grained individual level, thereby reducing matching uncertainty. ReID evaluates candidates with varying confidence levels, improving detection and retrieval accuracy. Experiments on CUHK-SYSU-TBPS and PRW-TBPS datasets validate the effectiveness of our framework.
Problem

Research questions and friction points this paper is trying to address.

Locate target pedestrians in untrimmed images using text descriptions
Reduce uncertainties in detection and matching in complex scenes
Improve accuracy via multi-granularity uncertainty estimation and decoupling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-granularity Uncertainty Estimation for target confidence
Prototype-based Uncertainty Decoupling for feature extraction
Cross-modal Re-identification for improved accuracy
🔎 Similar Papers
No similar papers found.
Z
Zengli Luo
Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
C
Canlong Zhang
Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China; Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
X
Xiaochun Lu
Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
Zhixin Li
Zhixin Li
Syracuse University School of Information Studies
Social MachinesChildrenNon-human Relationship
Zhiwen Wang
Zhiwen Wang
Phd, Sichuan University;
Continual learningImage processingMRIInverse problem