DynASyn: Multi-Subject Personalization Enabling Dynamic Action Synthesis

📅 2025-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-to-image diffusion models struggle with multi-subject personalized dynamic motion synthesis from a single reference image, particularly in behavior editing and interactive modeling. This paper introduces ConceptDiffusion, a concept-driven diffusion editing framework. Its core contributions are: (1) concept-driven cross-modal attention regularization, explicitly enforcing consistency between subject identity and action semantics; (2) concept-enhanced prompt-image joint augmentation, improving few-shot generalization; and (3) SDE-based fine-grained diffusion editing aligned with concept priors. Under single-image guidance, our method significantly improves identity fidelity, motion diversity, and interaction plausibility in generated images. Quantitative and qualitative evaluations demonstrate consistent superiority over state-of-the-art baselines across multiple metrics and visual benchmarks.

Technology Category

Application Category

📝 Abstract
Recent advances in text-to-image diffusion models spurred research on personalization, i.e., a customized image synthesis, of subjects within reference images. Although existing personalization methods are able to alter the subjects' positions or to personalize multiple subjects simultaneously, they often struggle to modify the behaviors of subjects or their dynamic interactions. The difficulty is attributable to overfitting to reference images, which worsens if only a single reference image is available. We propose DynASyn, an effective multi-subject personalization from a single reference image addressing these challenges. DynASyn preserves the subject identity in the personalization process by aligning concept-based priors with subject appearances and actions. This is achieved by regularizing the attention maps between the subject token and images through concept-based priors. In addition, we propose concept-based prompt-and-image augmentation for an enhanced trade-off between identity preservation and action diversity. We adopt an SDE-based editing guided by augmented prompts to generate diverse appearances and actions while maintaining identity consistency in the augmented images. Experiments show that DynASyn is capable of synthesizing highly realistic images of subjects with novel contexts and dynamic interactions with the surroundings, and outperforms baseline methods in both quantitative and qualitative aspects.
Problem

Research questions and friction points this paper is trying to address.

Modify subject behaviors and dynamic interactions in images
Prevent overfitting to single reference image personalization
Balance identity preservation with action diversity synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns concept priors with subject appearances
Regularizes attention maps via concept priors
Uses SDE-based editing for diverse actions
Yongjin Choi
Yongjin Choi
Innerverz
Diffusion modelGenerative AI
C
Chanhun Park
Department of Computer Science, Korea University, Seoul, Korea
Seung Jun Baek
Seung Jun Baek
korea university