FreeFuse: Multi-Subject LoRA Fusion via Auto Masking at Test Time

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

To address the limitations of existing LoRA fusion methods for multi-subject text-to-image generation—which require additional training, auxiliary models, or manual annotations—this paper proposes a fully training-free, inference-time automatic fusion approach. Our method leverages cross-attention weights from diffusion models to generate context-aware, dynamic subject masks that precisely separate and weight multiple LoRA outputs for fusion. We theoretically prove that this procedure is equivalent to region-wise ensembling of LoRAs within the model. The method operates as a plug-and-play module in standard text-to-image pipelines, requiring only user-specified LoRA activation tokens. Experiments demonstrate substantial improvements in both generation quality and usability over prior art. To our knowledge, this is the first method achieving zero-shot, zero-training, zero-auxiliary-model, and zero-manual-annotation multi-subject LoRA fusion.

Technology Category

Application Category

📝 Abstract

This paper proposes FreeFuse, a novel training-free approach for multi-subject text-to-image generation through automatic fusion of multiple subject LoRAs. In contrast to existing methods that either focus on pre-inference LoRA weight merging or rely on segmentation models and complex techniques like noise blending to isolate LoRA outputs, our key insight is that context-aware dynamic subject masks can be automatically derived from cross-attention layer weights. Mathematical analysis shows that directly applying these masks to LoRA outputs during inference well approximates the case where the subject LoRA is integrated into the diffusion model and used individually for the masked region. FreeFuse demonstrates superior practicality and efficiency as it requires no additional training, no modification to LoRAs, no auxiliary models, and no user-defined prompt templates or region specifications. Alternatively, it only requires users to provide the LoRA activation words for seamless integration into standard workflows. Extensive experiments validate that FreeFuse outperforms existing approaches in both generation quality and usability under the multi-subject generation tasks. The project page is at https://future-item.github.io/FreeFuse/

Problem

Research questions and friction points this paper is trying to address.

Automatically fuses multiple subject LoRAs without training

Generates dynamic masks from cross-attention weights for isolation

Enables seamless multi-subject image generation without auxiliary models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatic fusion of multiple subject LoRAs

Dynamic masks from cross-attention weights

No training or modification to LoRAs required

🔎 Similar Papers

No similar papers found.