SapiensID: Foundation for Human Recognition

📅 2025-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing person re-identification (ReID) systems typically model face and body recognition separately, limiting generalization under real-world challenges such as multi-pose variations, multi-scale appearances, and partial occlusion. This work introduces the first unified foundation model for human identification and proposes a novel cross-pose–scale ReID task. Methodologically, we design a dynamic retinal partitioning (RP) mechanism for adaptive regional decomposition, incorporate a mask recognition model (MRM) to enable variable-length local representation learning, and develop a semantic attention head (SAH) to aggregate discriminative part-level features—all built upon a Transformer architecture and trained on the large-scale WebBody4M dataset. Our model achieves state-of-the-art performance on mainstream body ReID benchmarks and significantly outperforms specialized models in both short-term and long-term tracking scenarios, as well as in the newly introduced cross-pose–scale ReID task, establishing a new robust baseline for human identification.

Technology Category

Application Category

📝 Abstract
Existing human recognition systems often rely on separate, specialized models for face and body analysis, limiting their effectiveness in real-world scenarios where pose, visibility, and context vary widely. This paper introduces SapiensID, a unified model that bridges this gap, achieving robust performance across diverse settings. SapiensID introduces (i) Retina Patch (RP), a dynamic patch generation scheme that adapts to subject scale and ensures consistent tokenization of regions of interest, (ii) a masked recognition model (MRM) that learns from variable token length, and (iii) Semantic Attention Head (SAH), an module that learns pose-invariant representations by pooling features around key body parts. To facilitate training, we introduce WebBody4M, a large-scale dataset capturing diverse poses and scale variations. Extensive experiments demonstrate that SapiensID achieves state-of-the-art results on various body ReID benchmarks, outperforming specialized models in both short-term and long-term scenarios while remaining competitive with dedicated face recognition systems. Furthermore, SapiensID establishes a strong baseline for the newly introduced challenge of Cross Pose-Scale ReID, demonstrating its ability to generalize to complex, real-world conditions.
Problem

Research questions and friction points this paper is trying to address.

Unified model for human recognition across varying poses and visibility
Dynamic patch generation for consistent region tokenization
Pose-invariant feature learning for robust real-world performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic patch generation for scale adaptation
Masked recognition model for variable tokens
Semantic attention for pose-invariant features
🔎 Similar Papers
No similar papers found.
M
Minchul Kim
Department of Computer Science and Engineering, Michigan State University
D
Dingqiang Ye
Department of Computer Science and Engineering, Michigan State University
Yiyang Su
Yiyang Su
Michigan State University
Computer Vision
F
Feng Liu
Department of Computer Science, Drexel University
X
Xiaoming Liu
Department of Computer Science and Engineering, Michigan State University