Cached Multi-Lora Composition for Multi-Concept Image Generation

๐Ÿ“… 2025-02-07
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address semantic conflicts and degraded image quality arising from multi-LoRA fusion, this paper proposes Cached Multi-LoRA (CMLoRA), a frequency-domain-driven framework. We first uncover, from a Fourier spectral perspective, LoRAโ€™s distinct modulation mechanisms on high- versus low-frequency featuresโ€”a novel analytical insight. Leveraging this, we design a generalizable LoRA ranking strategy to enhance fusion coherence. Furthermore, we introduce a lightweight, training-free multi-LoRA integration architecture supporting non-uniform caching. Extensive evaluation demonstrates that CMLoRA achieves an average +2.19% improvement in CLIPScore and a +11.25% increase in MLLM win rate, significantly outperforming state-of-the-art methods including LoRAHub, LoRA Composite, and LoRA Switch. Our core contributions are threefold: (1) a novel frequency-domain analytical framework for LoRA behavior; (2) a generalizable, frequency-aware LoRA ranking strategy; and (3) a training-free, computationally efficient fusion architecture with flexible, non-uniform caching.

Technology Category

Application Category

๐Ÿ“ Abstract
Low-Rank Adaptation (LoRA) has emerged as a widely adopted technique in text-to-image models, enabling precise rendering of multiple distinct elements, such as characters and styles, in multi-concept image generation. However, current approaches face significant challenges when composing these LoRAs for multi-concept image generation, resulting in diminished generated image quality. In this paper, we initially investigate the role of LoRAs in the denoising process through the lens of the Fourier frequency domain. Based on the hypothesis that applying multiple LoRAs could lead to"semantic conflicts", we find that certain LoRAs amplify high-frequency features such as edges and textures, whereas others mainly focus on low-frequency elements, including the overall structure and smooth color gradients. Building on these insights, we devise a frequency domain based sequencing strategy to determine the optimal order in which LoRAs should be integrated during inference. This strategy offers a methodical and generalizable solution compared to the naive integration commonly found in existing LoRA fusion techniques. To fully leverage our proposed LoRA order sequence determination method in multi-LoRA composition tasks, we introduce a novel, training-free framework, Cached Multi-LoRA (CMLoRA), designed to efficiently integrate multiple LoRAs while maintaining cohesive image generation. With its flexible backbone for multi-LoRA fusion and a non-uniform caching strategy tailored to individual LoRAs, CMLoRA has the potential to reduce semantic conflicts in LoRA composition and improve computational efficiency. Our experimental evaluations demonstrate that CMLoRA outperforms state-of-the-art training-free LoRA fusion methods by a significant margin -- it achieves an average improvement of $2.19%$ in CLIPScore, and $11.25%$ in MLLM win rate compared to LoraHub, LoRA Composite, and LoRA Switch.
Problem

Research questions and friction points this paper is trying to address.

Optimizing LoRA integration order
Reducing semantic conflicts in LoRA
Enhancing multi-concept image generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency domain sequencing strategy
Cached Multi-LoRA framework
Non-uniform caching strategy
๐Ÿ”Ž Similar Papers
No similar papers found.