🤖 AI Summary
This work addresses high-fidelity, identity-preserving, and robust makeup transfer without auxiliary facial control modules. We propose RefLoRAInjector, a lightweight makeup feature injector that decouples the reference image pathway from the backbone network, enabling end-to-end makeup feature learning from source–reference image pairs. Built upon the FLUX-Kontext diffusion Transformer architecture, our method integrates conditional image input with the RefLoRAInjector structure and introduces a high-precision paired makeup data generation pipeline to enhance supervision quality. Experiments demonstrate that our approach significantly outperforms existing methods across diverse scenarios, achieving state-of-the-art performance in makeup fidelity, identity consistency, and robustness to cross-pose and cross-illumination variations—while eliminating reliance on complex auxiliary components such as facial landmarks or 3D morphable models.
📝 Abstract
Makeup transfer aims to apply the makeup style from a reference face to a target face and has been increasingly adopted in practical applications. Existing GAN-based approaches typically rely on carefully designed loss functions to balance transfer quality and facial identity consistency, while diffusion-based methods often depend on additional face-control modules or algorithms to preserve identity. However, these auxiliary components tend to introduce extra errors, leading to suboptimal transfer results. To overcome these limitations, we propose FLUX-Makeup, a high-fidelity, identity-consistent, and robust makeup transfer framework that eliminates the need for any auxiliary face-control components. Instead, our method directly leverages source-reference image pairs to achieve superior transfer performance. Specifically, we build our framework upon FLUX-Kontext, using the source image as its native conditional input. Furthermore, we introduce RefLoRAInjector, a lightweight makeup feature injector that decouples the reference pathway from the backbone, enabling efficient and comprehensive extraction of makeup-related information. In parallel, we design a robust and scalable data generation pipeline to provide more accurate supervision during training. The paired makeup datasets produced by this pipeline significantly surpass the quality of all existing datasets. Extensive experiments demonstrate that FLUX-Makeup achieves state-of-the-art performance, exhibiting strong robustness across diverse scenarios.