Leveraging Multi-View Weak Supervision for Occlusion-Aware Multi-Human Parsing

πŸ“… 2025-09-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address performance degradation in multi-person parsing caused by occlusion in crowded scenes, this paper proposes a weakly supervised multi-person parsing method leveraging multi-view RGB+D data. The method introduces three key contributions: (1) a multi-view consistency loss that enforces geometric constraints across views to improve segmentation robustness in occluded regions; (2) a semi-automatic annotation strategy for efficient generation of instance- and part-level masks; and (3) an end-to-end joint modeling framework that fuses 3D skeleton estimation with RGB+D features to simultaneously resolve instance and part segmentation. Evaluated on standard occlusion benchmarks, the approach achieves up to a 4.20% relative improvement over strong baselines, demonstrating significant gains in parsing accuracy and generalization capability under complex, densely populated scenarios.

Technology Category

Application Category

πŸ“ Abstract
Multi-human parsing is the task of segmenting human body parts while associating each part to the person it belongs to, combining instance-level and part-level information for fine-grained human understanding. In this work, we demonstrate that, while state-of-the-art approaches achieved notable results on public datasets, they struggle considerably in segmenting people with overlapping bodies. From the intuition that overlapping people may appear separated from a different point of view, we propose a novel training framework exploiting multi-view information to improve multi-human parsing models under occlusions. Our method integrates such knowledge during the training process, introducing a novel approach based on weak supervision on human instances and a multi-view consistency loss. Given the lack of suitable datasets in the literature, we propose a semi-automatic annotation strategy to generate human instance segmentation masks from multi-view RGB+D data and 3D human skeletons. The experiments demonstrate that the approach can achieve up to a 4.20% relative improvement on human parsing over the baseline model in occlusion scenarios.
Problem

Research questions and friction points this paper is trying to address.

Improving multi-human parsing under occlusion scenarios
Exploiting multi-view weak supervision for occlusion handling
Generating instance segmentation masks from multi-view data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-view weak supervision training
Occlusion-aware multi-human parsing
Multi-view consistency loss integration
πŸ”Ž Similar Papers
No similar papers found.