🤖 AI Summary
Weakly supervised semantic segmentation (WSSS) relies on image-level labels, yet existing methods overemphasize inter-class discriminability while neglecting semantic sharing among visually or semantically similar categories, leading to ambiguous class activation maps and suboptimal segmentation performance. To address this, we introduce large language models (LLMs) into WSSS for the first time, leveraging prompt engineering to generate interpretable, semantically coherent category clusters that explicitly model inter-class commonalities—not just distinctions. We further propose a unified framework comprising category embedding clustering, relation-aware feature fusion, and multi-stage weakly supervised training to enable pixel-level guidance grounded in shared semantics. Evaluated on PASCAL VOC 2012, our method achieves a new state-of-the-art mIoU of 72.3%, demonstrating significant improvements in both accuracy and robustness. This work validates the effectiveness and generalizability of LLM-driven semantic clustering for WSSS.
📝 Abstract
Weakly Supervised Semantic Segmentation (WSSS), which leverages image-level labels, has garnered significant attention due to its cost-effectiveness. The previous methods mainly strengthen the inter-class differences to avoid class semantic ambiguity which may lead to erroneous activation. However, they overlook the positive function of some shared information between similar classes. Categories within the same cluster share some similar features. Allowing the model to recognize these features can further relieve the semantic ambiguity between these classes. To effectively identify and utilize this shared information, in this paper, we introduce a novel WSSS framework called Prompt Categories Clustering (PCC). Specifically, we explore the ability of Large Language Models (LLMs) to derive category clusters through prompts. These clusters effectively represent the intrinsic relationships between categories. By integrating this relational information into the training network, our model is able to better learn the hidden connections between categories. Experimental results demonstrate the effectiveness of our approach, showing its ability to enhance performance on the PASCAL VOC 2012 dataset and surpass existing state-of-the-art methods in WSSS.