π€ AI Summary
Existing diffusion models struggle to simultaneously achieve precision and flexibility in element-level image generation and editing. To address this, BlobCtrl introduces a probabilistic blob representation paradigm that disentangles an elementβs position, semantics, and identity, enabling fine-grained, controllable manipulation. Methodologically, it employs a dual-branch hierarchical fusion diffusion architecture, integrating self-supervised training via a customized score function and a controllable dropout mechanism to dynamically balance fidelity and diversity. Technically, it unifies blob-based representation, hierarchical feature fusion, and element-aware data augmentation. Evaluated on the newly constructed benchmark BlobBench, BlobCtrl achieves state-of-the-art performance with high computational efficiency. Trained and validated on the large-scale blob-centric dataset BlobData, the model significantly improves controllability, consistency, and generalization in element-level editing.
π Abstract
Element-level visual manipulation is essential in digital content creation, but current diffusion-based methods lack the precision and flexibility of traditional tools. In this work, we introduce BlobCtrl, a framework that unifies element-level generation and editing using a probabilistic blob-based representation. By employing blobs as visual primitives, our approach effectively decouples and represents spatial location, semantic content, and identity information, enabling precise element-level manipulation. Our key contributions include: 1) a dual-branch diffusion architecture with hierarchical feature fusion for seamless foreground-background integration; 2) a self-supervised training paradigm with tailored data augmentation and score functions; and 3) controllable dropout strategies to balance fidelity and diversity. To support further research, we introduce BlobData for large-scale training and BlobBench for systematic evaluation. Experiments show that BlobCtrl excels in various element-level manipulation tasks while maintaining computational efficiency, offering a practical solution for precise and flexible visual content creation. Project page: https://liyaowei-stu.github.io/project/BlobCtrl/