P3S-Diffusion:A Selective Subject-driven Generation Framework via Point Supervision

📅 2024-12-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Precise selection and reuse of similar subjects (e.g., multiple dogs) in reference images remains challenging. Method: We propose a lightweight point-supervised subject-driven generation framework. It introduces a novel point-guided mask generation mechanism that produces extended segmentation masks end-to-end—eliminating the need for manual mask annotation or external segmentation models. Additionally, we design multi-level conditional feature injection and an attention consistency loss to enhance subject feature fidelity and context-aware alignment. Contribution/Results: Our method achieves state-of-the-art (SOTA) performance in subject fidelity and generation quality across multiple benchmarks. Point annotation cost is reduced by over 90% compared to conventional mask-based approaches. Inference requires no additional modules, enabling efficient, fine-grained, and multi-subject controllable image generation.

Technology Category

Application Category

📝 Abstract

Recent research in subject-driven generation increasingly emphasizes the importance of selective subject features. Nevertheless, accurately selecting the content in a given reference image still poses challenges, especially when selecting the similar subjects in an image (e.g., two different dogs). Some methods attempt to use text prompts or pixel masks to isolate specific elements. However, text prompts often fall short in precisely describing specific content, and pixel masks are often expensive. To address this, we introduce P3S-Diffusion, a novel architecture designed for context-selected subject-driven generation via point supervision. P3S-Diffusion leverages minimal cost label (e.g., points) to generate subject-driven images. During fine-tuning, it can generate an expanded base mask from these points, obviating the need for additional segmentation models. The mask is employed for inpainting and aligning with subject representation. The P3S-Diffusion preserves fine features of the subjects through Multi-layers Condition Injection. Enhanced by the Attention Consistency Loss for improved training, extensive experiments demonstrate its excellent feature preservation and image generation capabilities.

Problem

Research questions and friction points this paper is trying to address.

Image Generation

Feature Selection

Specific Attributes

Innovation

Methods, ideas, or system contributions that make the work stand out.

P3S-Diffusion

Deep Learning

Selective Image Generation

🔎 Similar Papers

No similar papers found.

Authors to Follow