Mask Factory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation

📅 2024-12-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-precision annotation for binary image segmentation is costly, while synthetic data suffers from domain shift and limited diversity. Method: We propose a mask editing paradigm integrating rigid (zero-shot geometric priors) and non-rigid (adversarial training + self-attention–based topological preservation) transformations, embedded within a multi-condition-controllable image-mask co-generation framework. Our approach synergistically incorporates diffusion-based geometric priors, zero-shot view synthesis, adversarial learning, and self-attention mechanisms to generate high-fidelity, high-resolution, and scene-diverse image-mask pairs. Results: On the DIS5K benchmark, our method significantly outperforms state-of-the-art approaches in mask accuracy and scene coverage, while reducing data preparation time and annotation cost substantially. The framework demonstrates strong scalability and generalizability across diverse segmentation scenarios.

Technology Category

Application Category

📝 Abstract
Dichotomous Image Segmentation (DIS) tasks require highly precise annotations, and traditional dataset creation methods are labor intensive, costly, and require extensive domain expertise. Although using synthetic data for DIS is a promising solution to these challenges, current generative models and techniques struggle with the issues of scene deviations, noise-induced errors, and limited training sample variability. To address these issues, we introduce a novel approach, extbf{ourmodel{}}, which provides a scalable solution for generating diverse and precise datasets, markedly reducing preparation time and costs. We first introduce a general mask editing method that combines rigid and non-rigid editing techniques to generate high-quality synthetic masks. Specially, rigid editing leverages geometric priors from diffusion models to achieve precise viewpoint transformations under zero-shot conditions, while non-rigid editing employs adversarial training and self-attention mechanisms for complex, topologically consistent modifications. Then, we generate pairs of high-resolution image and accurate segmentation mask using a multi-conditional control generation method. Finally, our experiments on the widely-used DIS5K dataset benchmark demonstrate superior performance in quality and efficiency compared to existing methods. The code is available at url{https://qian-hao-tian.github.io/MaskFactory/}.
Problem

Research questions and friction points this paper is trying to address.

Image Segmentation
Data Diversity
Training Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

mask editing
diffusion model
multi-condition generation
🔎 Similar Papers
No similar papers found.