Create Anything Anywhere: Layout-Controllable Personalized Diffusion Model for Multiple Subjects

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Existing personalized diffusion models struggle to simultaneously achieve precise spatial layout control and faithful preservation of multiple subjects’ identities. This paper proposes a fine-tuning-free framework for multi-subject personalized image generation, introducing two key innovations: (1) a dynamic-static complementary visual refinement module that captures temporal or pose variations in reference images via dynamic feature modeling, while distilling static identity cues and leveraging cross-attention guidance; and (2) a two-stage layout control mechanism imposing explicit spatial constraints during both training and inference. Together, these enable pixel-level layout controllability and consistent cross-subject identity preservation. Extensive evaluations across multiple benchmarks demonstrate significant improvements: +23.6% in ID-Retrieval accuracy and +31.4% in IoU. The framework supports arbitrary subject identity specification and customizable spatial layouts even in complex scenes.

Technology Category

Application Category

📝 Abstract

Diffusion models have significantly advanced text-to-image generation, laying the foundation for the development of personalized generative frameworks. However, existing methods lack precise layout controllability and overlook the potential of dynamic features of reference subjects in improving fidelity. In this work, we propose Layout-Controllable Personalized Diffusion (LCP-Diffusion) model, a novel framework that integrates subject identity preservation with flexible layout guidance in a tuning-free approach. Our model employs a Dynamic-Static Complementary Visual Refining module to comprehensively capture the intricate details of reference subjects, and introduces a Dual Layout Control mechanism to enforce robust spatial control across both training and inference stages. Extensive experiments validate that LCP-Diffusion excels in both identity preservation and layout controllability. To the best of our knowledge, this is a pioneering work enabling users to"create anything anywhere".

Problem

Research questions and friction points this paper is trying to address.

Lack precise layout controllability in personalized diffusion models

Overlook dynamic features of reference subjects for fidelity improvement

Need tuning-free integration of identity preservation and layout guidance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tuning-free personalized diffusion model

Dynamic-Static Complementary Visual Refining

Dual Layout Control mechanism

🔎 Similar Papers

GazeFusion: Saliency-guided Image Generation