Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low personalization accuracy, heavy reliance on explicit user intervention, and token-capacity limitations of text encoders in text-to-image (T2I) diffusion models, this paper proposes DrUM: a lightweight Transformer adapter operating in the latent space that conditions generation on structured user profile embeddings—without fine-tuning the base model. DrUM is the first method to deeply integrate structured user profiles into the conditional mechanism of diffusion models, achieving seamless compatibility with open-source text encoders and enabling dynamic latent-space injection of user signals. This design significantly enhances personalized representation capability. Experiments demonstrate that DrUM is plug-and-play on mainstream T2I models (e.g., Stable Diffusion), requiring only minimal user data to generate high-fidelity, semantically consistent personalized images. It outperforms existing adapter-based approaches across multiple benchmarks.

Technology Category

Application Category

📝 Abstract
Personalized generation in T2I diffusion models aims to naturally incorporate individual user preferences into the generation process with minimal user intervention. However, existing studies primarily rely on prompt-level modeling with large-scale models, often leading to inaccurate personalization due to the limited input token capacity of T2I diffusion models. To address these limitations, we propose DrUM, a novel method that integrates user profiling with a transformer-based adapter to enable personalized generation through condition-level modeling in the latent space. DrUM demonstrates strong performance on large-scale datasets and seamlessly integrates with open-source text encoders, making it compatible with widely used foundation T2I models without requiring additional fine-tuning.
Problem

Research questions and friction points this paper is trying to address.

Incorporates user preferences into T2I generation with minimal intervention
Overcomes prompt-level limitations in personalization accuracy
Enables condition-level modeling via user profiling and adapter
Innovation

Methods, ideas, or system contributions that make the work stand out.

Condition-level modeling in latent space
Transformer-based adapter for user profiling
Compatible with open-source text encoders
🔎 Similar Papers
No similar papers found.