🤖 AI Summary
Existing methods for cross-identity makeup transfer struggle to simultaneously preserve makeup fidelity and identity consistency, primarily due to the absence of structured paired data (i.e., source–target pairs sharing identity, and reference–target pairs sharing makeup style). To address this, we propose EvoMakeup—a unified multimodal framework for full-face/local image-guided and text-driven makeup editing—alongside MakeupQuad, the first quadruple-composed dataset explicitly designed for makeup editing, featuring identity- and makeup-aligned source–reference–target–ground-truth tuples. EvoMakeup employs identity–makeup disentangled representation learning, multi-stage knowledge distillation, and a shared generative architecture. It enables co-optimization of data and model through iterative refinement. Evaluated on real-world benchmarks, EvoMakeup significantly outperforms state-of-the-art methods, achieving unprecedented unification of diverse editing paradigms within a single model while improving makeup detail fidelity, identity preservation rate, and user controllability.
📝 Abstract
Facial makeup editing aims to realistically transfer makeup from a reference to a target face. Existing methods often produce low-quality results with coarse makeup details and struggle to preserve both identity and makeup fidelity, mainly due to the lack of structured paired data -- where source and result share identity, and reference and result share identical makeup. To address this, we introduce MakeupQuad, a large-scale, high-quality dataset with non-makeup faces, references, edited results, and textual makeup descriptions. Building on this, we propose EvoMakeup, a unified training framework that mitigates image degradation during multi-stage distillation, enabling iterative improvement of both data and model quality. Although trained solely on synthetic data, EvoMakeup generalizes well and outperforms prior methods on real-world benchmarks. It supports high-fidelity, controllable, multi-task makeup editing -- including full-face and partial reference-based editing, as well as text-driven makeup editing -- within a single model. Experimental results demonstrate that our method achieves superior makeup fidelity and identity preservation, effectively balancing both aspects. Code and dataset will be released upon acceptance.