DisFaceRep: Representation Disentanglement for Co-occurring Facial Components in Weakly Supervised Face Parsing

📅 2025-08-02

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Weakly supervised facial parsing (WSFP) methods rely solely on image-level labels and natural language descriptions to reduce annotation costs; however, high co-occurrence and visual similarity among facial components lead to ambiguous activations and suboptimal segmentation performance. To address this, we propose an explicit–implicit disentanglement framework: (1) a co-occurrence-aware decoupling strategy mitigates dataset bias by explicitly modeling component correlations, and (2) a text-guided disentanglement loss leverages linguistic priors to enhance semantic separation of facial parts. Our approach unifies weakly supervised semantic segmentation, representation disentanglement, and multimodal joint supervision. Extensive experiments on CelebAMask-HQ, LaPa, and Helen demonstrate significant improvements over state-of-the-art WSFP methods, validating the effectiveness of our disentanglement mechanism in enhancing localization accuracy and part discrimination capability.

Technology Category

Application Category

📝 Abstract

Face parsing aims to segment facial images into key components such as eyes, lips, and eyebrows. While existing methods rely on dense pixel-level annotations, such annotations are expensive and labor-intensive to obtain. To reduce annotation cost, we introduce Weakly Supervised Face Parsing (WSFP), a new task setting that performs dense facial component segmentation using only weak supervision, such as image-level labels and natural language descriptions. WSFP introduces unique challenges due to the high co-occurrence and visual similarity of facial components, which lead to ambiguous activations and degraded parsing performance. To address this, we propose DisFaceRep, a representation disentanglement framework designed to separate co-occurring facial components through both explicit and implicit mechanisms. Specifically, we introduce a co-occurring component disentanglement strategy to explicitly reduce dataset-level bias, and a text-guided component disentanglement loss to guide component separation using language supervision implicitly. Extensive experiments on CelebAMask-HQ, LaPa, and Helen demonstrate the difficulty of WSFP and the effectiveness of DisFaceRep, which significantly outperforms existing weakly supervised semantic segmentation methods. The code will be released at href{https://github.com/CVI-SZU/DisFaceRep}{ extcolor{cyan}{https://github.com/CVI-SZU/DisFaceRep}}.

Problem

Research questions and friction points this paper is trying to address.

Reducing annotation cost for face parsing with weak supervision

Separating co-occurring facial components to improve parsing accuracy

Leveraging text guidance to disentangle visually similar facial features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Weakly supervised face parsing with image-level labels

Disentanglement framework for co-occurring facial components

Text-guided loss for implicit component separation

🔎 Similar Papers

No similar papers found.