LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

📅 2024-12-06
🏛️ arXiv.org
📈 Citations: 5
Influential: 0
📄 PDF
🤖 AI Summary
Real-time fusion of content and style LoRAs in personalized image generation remains challenging due to high computational overhead of existing optimization-based methods, rendering them unsuitable for edge devices. Method: This paper proposes a hypernetwork-based dynamic LoRA fusion paradigm that bypasses iterative optimization; instead, a lightweight hypernetwork directly predicts optimal merging weights for millisecond-level, high-fidelity synthesis. Contribution/Results: We introduce an MLLM-driven joint content-style evaluation protocol to overcome biases inherent in conventional metrics, and adopt a cross-content–style generalization training strategy to significantly enhance model generalizability. Experiments demonstrate that our method accelerates fusion by over 4,000× compared to state-of-the-art optimization approaches, while achieving new benchmarks in both content and style fidelity—rigorously validated via automated MLLM assessment and human evaluation.

Technology Category

Application Category

📝 Abstract
Recent advancements in image generation models have enabled personalized image creation with both user-defined subjects (content) and styles. Prior works achieved personalization by merging corresponding low-rank adaptation parameters (LoRAs) through optimization-based methods, which are computationally demanding and unsuitable for real-time use on resource-constrained devices like smartphones. To address this, we introduce LoRA$.$rar, a method that not only improves image quality but also achieves a remarkable speedup of over $4000 imes$ in the merging process. LoRA$.$rar pre-trains a hypernetwork on a diverse set of content-style LoRA pairs, learning an efficient merging strategy that generalizes to new, unseen content-style pairs, enabling fast, high-quality personalization. Moreover, we identify limitations in existing evaluation metrics for content-style quality and propose a new protocol using multimodal large language models (MLLM) for more accurate assessment. Our method significantly outperforms the current state of the art in both content and style fidelity, as validated by MLLM assessments and human evaluations.
Problem

Research questions and friction points this paper is trying to address.

Efficient merging of LoRAs for real-time image generation
Improving image quality and speed in personalization
Accurate evaluation of content-style fidelity using MLLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hypernetwork merges LoRAs efficiently
4000x speedup in merging process
MLLMs improve content-style evaluation
🔎 Similar Papers
No similar papers found.