VISUALCENT: Visual Human Analysis using Dynamic Centroid Representation

📅 2025-04-26

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

VISUALCENT addresses the limited generalizability of pose estimation and instance segmentation, as well as insufficient robustness to occlusion and motion, in multi-person visual analysis. It proposes a unified bottom-up framework centered on a dynamic centroid representation mechanism: introducing KeyCentroid (keypoint centroid) and MaskCentroid (mask centroid), jointly leveraging disk-shaped heatmap modeling and explicit centroid-driven pixel clustering to enable co-optimization of keypoint detection and instance segmentation. This paradigm significantly enhances resilience to severe occlusion and rapid, large-scale motion. Evaluated on COCO and OCHuman benchmarks, VISUALCENT achieves state-of-the-art performance in both mAP and FPS, enabling real-time, high-accuracy multi-person analysis. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

We introduce VISUALCENT, a unified human pose and instance segmentation framework to address generalizability and scalability limitations to multi person visual human analysis. VISUALCENT leverages centroid based bottom up keypoint detection paradigm and uses Keypoint Heatmap incorporating Disk Representation and KeyCentroid to identify the optimal keypoint coordinates. For the unified segmentation task, an explicit keypoint is defined as a dynamic centroid called MaskCentroid to swiftly cluster pixels to specific human instance during rapid changes in human body movement or significantly occluded environment. Experimental results on COCO and OCHuman datasets demonstrate VISUALCENTs accuracy and real time performance advantages, outperforming existing methods in mAP scores and execution frame rate per second. The implementation is available on the project page.

Problem

Research questions and friction points this paper is trying to address.

Addresses generalizability and scalability in multi-person visual human analysis

Improves keypoint detection using centroid-based bottom-up paradigm

Enhances segmentation via dynamic centroids for occluded or moving humans

Innovation

Methods, ideas, or system contributions that make the work stand out.

Centroid-based bottom-up keypoint detection paradigm

Keypoint Heatmap with Disk Representation and KeyCentroid

Dynamic MaskCentroid for swift pixel clustering

🔎 Similar Papers

No similar papers found.

Bosch Group

Hildesheim, NDS, DE

AI Algorithm Expert - Hand Tracking, PICO - San Jose

ByteDance

San Jose

Research Scientist Intern, Machine Perception for Input and Interaction (PhD)