Unsupervised Class Generation to Expand Semantic Segmentation Datasets

📅 2025-01-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high cost of manual annotation and poor scalability to novel classes in semantic segmentation, this paper proposes an end-to-end unsupervised category generation framework that requires no human intervention. Our method is the first to synergistically integrate Stable Diffusion (text-to-image generation) and the Segment Anything Model (SAM, zero-shot mask segmentation), enabling text-guided synthesis of novel-class samples with precise pixel-level masks. These synthetic samples are seamlessly fused into closed-set datasets via cutout-based composition and unsupervised domain adaptation. The framework supports zero-code extension—no architectural modifications or retraining are needed. Experiments demonstrate an average IoU of 51% on novel classes, while simultaneously improving segmentation accuracy on known classes. Overall performance significantly surpasses existing baselines.

Technology Category

Application Category

📝 Abstract
Semantic segmentation is a computer vision task where classification is performed at a pixel level. Due to this, the process of labeling images for semantic segmentation is time-consuming and expensive. To mitigate this cost there has been a surge in the use of synthetically generated data -- usually created using simulators or videogames -- which, in combination with domain adaptation methods, can effectively learn how to segment real data. Still, these datasets have a particular limitation: due to their closed-set nature, it is not possible to include novel classes without modifying the tool used to generate them, which is often not public. Concurrently, generative models have made remarkable progress, particularly with the introduction of diffusion models, enabling the creation of high-quality images from text prompts without additional supervision. In this work, we propose an unsupervised pipeline that leverages Stable Diffusion and Segment Anything Module to generate class examples with an associated segmentation mask, and a method to integrate generated cutouts for novel classes in semantic segmentation datasets, all with minimal user input. Our approach aims to improve the performance of unsupervised domain adaptation methods by introducing novel samples into the training data without modifications to the underlying algorithms. With our methods, we show how models can not only effectively learn how to segment novel classes, with an average performance of 51% IoU, but also reduce errors for other, already existing classes, reaching a higher performance level overall.
Problem

Research questions and friction points this paper is trying to address.

Automatic Dataset Enrichment
Computer Vision
Zero-shot Learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Models
Stable Diffusion
Automatic Image Annotation
🔎 Similar Papers