Leveraging Multi-View Weak Supervision for Occlusion-Aware Multi-Human Parsing

📅 2025-09-12

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address performance degradation in multi-person parsing caused by occlusion in crowded scenes, this paper proposes a weakly supervised multi-person parsing method leveraging multi-view RGB+D data. The method introduces three key contributions: (1) a multi-view consistency loss that enforces geometric constraints across views to improve segmentation robustness in occluded regions; (2) a semi-automatic annotation strategy for efficient generation of instance- and part-level masks; and (3) an end-to-end joint modeling framework that fuses 3D skeleton estimation with RGB+D features to simultaneously resolve instance and part segmentation. Evaluated on standard occlusion benchmarks, the approach achieves up to a 4.20% relative improvement over strong baselines, demonstrating significant gains in parsing accuracy and generalization capability under complex, densely populated scenarios.

Technology Category

Application Category

📝 Abstract

Multi-human parsing is the task of segmenting human body parts while associating each part to the person it belongs to, combining instance-level and part-level information for fine-grained human understanding. In this work, we demonstrate that, while state-of-the-art approaches achieved notable results on public datasets, they struggle considerably in segmenting people with overlapping bodies. From the intuition that overlapping people may appear separated from a different point of view, we propose a novel training framework exploiting multi-view information to improve multi-human parsing models under occlusions. Our method integrates such knowledge during the training process, introducing a novel approach based on weak supervision on human instances and a multi-view consistency loss. Given the lack of suitable datasets in the literature, we propose a semi-automatic annotation strategy to generate human instance segmentation masks from multi-view RGB+D data and 3D human skeletons. The experiments demonstrate that the approach can achieve up to a 4.20% relative improvement on human parsing over the baseline model in occlusion scenarios.

Problem

Research questions and friction points this paper is trying to address.

Improving multi-human parsing under occlusion scenarios

Exploiting multi-view weak supervision for occlusion handling

Generating instance segmentation masks from multi-view data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-view weak supervision training

Occlusion-aware multi-human parsing

Multi-view consistency loss integration

🔎 Similar Papers

No similar papers found.