Parallel Rescaling: Rebalancing Consistency Guidance for Personalized Diffusion Models

📅 2025-05-31

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Few-shot personalized diffusion models struggle to simultaneously preserve subject identity fidelity and align with textual prompts—especially under complex stylized prompts. To address this, we propose Parallel Rescaling: the first method to orthogonally decompose the consistency guidance signal into components parallel and orthogonal to Classifier-Free Guidance (CFG), dynamically rescaling only the parallel component. This approach minimizes interference with CFG—requiring no additional training data or model fine-tuning—thereby jointly optimizing identity stability and prompt adherence. The technique is plug-and-play, annotation-free, and preserves the original model’s architecture and inference pipeline. Empirical results demonstrate significant improvements in both identity consistency and text alignment across diverse stylistic prompts. On standard benchmarks, our method outperforms state-of-the-art approaches including DreamBooth, Textual Inversion, and DCO.

Technology Category

Application Category

📝 Abstract

Personalizing diffusion models to specific users or concepts remains challenging, particularly when only a few reference images are available. Existing methods such as DreamBooth and Textual Inversion often overfit to limited data, causing misalignment between generated images and text prompts when attempting to balance identity fidelity with prompt adherence. While Direct Consistency Optimization (DCO) with its consistency-guided sampling partially alleviates this issue, it still struggles with complex or stylized prompts. In this paper, we propose a parallel rescaling technique for personalized diffusion models. Our approach explicitly decomposes the consistency guidance signal into parallel and orthogonal components relative to classifier free guidance (CFG). By rescaling the parallel component, we minimize disruptive interference with CFG while preserving the subject's identity. Unlike prior personalization methods, our technique does not require additional training data or expensive annotations. Extensive experiments show improved prompt alignment and visual fidelity compared to baseline methods, even on challenging stylized prompts. These findings highlight the potential of parallel rescaled guidance to yield more stable and accurate personalization for diverse user inputs.

Problem

Research questions and friction points this paper is trying to address.

Overfitting in personalized diffusion models with few reference images

Misalignment between generated images and text prompts

Difficulty handling complex or stylized prompts in existing methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes guidance into parallel and orthogonal components

Rescales parallel component to minimize CFG interference

Requires no extra training data or annotations

🔎 Similar Papers

Improving Consistency Models with Generator-Augmented Flows