P3S-Diffusion:A Selective Subject-driven Generation Framework via Point Supervision

๐Ÿ“… 2024-12-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Precise selection and reuse of similar subjects (e.g., multiple dogs) in reference images remains challenging. Method: We propose a lightweight point-supervised subject-driven generation framework. It introduces a novel point-guided mask generation mechanism that produces extended segmentation masks end-to-endโ€”eliminating the need for manual mask annotation or external segmentation models. Additionally, we design multi-level conditional feature injection and an attention consistency loss to enhance subject feature fidelity and context-aware alignment. Contribution/Results: Our method achieves state-of-the-art (SOTA) performance in subject fidelity and generation quality across multiple benchmarks. Point annotation cost is reduced by over 90% compared to conventional mask-based approaches. Inference requires no additional modules, enabling efficient, fine-grained, and multi-subject controllable image generation.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent research in subject-driven generation increasingly emphasizes the importance of selective subject features. Nevertheless, accurately selecting the content in a given reference image still poses challenges, especially when selecting the similar subjects in an image (e.g., two different dogs). Some methods attempt to use text prompts or pixel masks to isolate specific elements. However, text prompts often fall short in precisely describing specific content, and pixel masks are often expensive. To address this, we introduce P3S-Diffusion, a novel architecture designed for context-selected subject-driven generation via point supervision. P3S-Diffusion leverages minimal cost label (e.g., points) to generate subject-driven images. During fine-tuning, it can generate an expanded base mask from these points, obviating the need for additional segmentation models. The mask is employed for inpainting and aligning with subject representation. The P3S-Diffusion preserves fine features of the subjects through Multi-layers Condition Injection. Enhanced by the Attention Consistency Loss for improved training, extensive experiments demonstrate its excellent feature preservation and image generation capabilities.
Problem

Research questions and friction points this paper is trying to address.

Image Generation
Feature Selection
Specific Attributes
Innovation

Methods, ideas, or system contributions that make the work stand out.

P3S-Diffusion
Deep Learning
Selective Image Generation
๐Ÿ”Ž Similar Papers
No similar papers found.
J
Junjie Hu
Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, China.
Shuyong Gao
Shuyong Gao
Fudan University
Human Visual AttentionGenerative ModelWeakly Supervised Learning
Lingyi Hong
Lingyi Hong
Fudan University
Computer Vision
Qishan Wang
Qishan Wang
Fudan Univiersity
Anomaly detection
Y
Yuzhou Zhao
Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, China.
Y
Yan Wang
Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, China.; Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University, Shanghai, China.
W
Wenqiang Zhang
Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, China.; Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University, Shanghai, China.; Engineering Research Center of AI & Robotics, Ministry of Education, Academy for Engineering & Technology, Fudan University, Shanghai, China.