Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models

📅 2025-08-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address low personalization accuracy, heavy reliance on explicit user intervention, and token-capacity limitations of text encoders in text-to-image (T2I) diffusion models, this paper proposes DrUM: a lightweight Transformer adapter operating in the latent space that conditions generation on structured user profile embeddings—without fine-tuning the base model. DrUM is the first method to deeply integrate structured user profiles into the conditional mechanism of diffusion models, achieving seamless compatibility with open-source text encoders and enabling dynamic latent-space injection of user signals. This design significantly enhances personalized representation capability. Experiments demonstrate that DrUM is plug-and-play on mainstream T2I models (e.g., Stable Diffusion), requiring only minimal user data to generate high-fidelity, semantically consistent personalized images. It outperforms existing adapter-based approaches across multiple benchmarks.

Technology Category

Application Category

📝 Abstract

Personalized generation in T2I diffusion models aims to naturally incorporate individual user preferences into the generation process with minimal user intervention. However, existing studies primarily rely on prompt-level modeling with large-scale models, often leading to inaccurate personalization due to the limited input token capacity of T2I diffusion models. To address these limitations, we propose DrUM, a novel method that integrates user profiling with a transformer-based adapter to enable personalized generation through condition-level modeling in the latent space. DrUM demonstrates strong performance on large-scale datasets and seamlessly integrates with open-source text encoders, making it compatible with widely used foundation T2I models without requiring additional fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Incorporates user preferences into T2I generation with minimal intervention

Overcomes prompt-level limitations in personalization accuracy

Enables condition-level modeling via user profiling and adapter

Innovation

Methods, ideas, or system contributions that make the work stand out.

Condition-level modeling in latent space

Transformer-based adapter for user profiling

Compatible with open-source text encoders

🔎 Similar Papers

No similar papers found.

Authors to Follow