Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation

📅 2024-12-22
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing flow-matching models require numerous function evaluations during sampling, compromising the trade-off between efficiency and generation quality—particularly yielding poor consistency in single-step or few-step sampling. This paper proposes a self-correcting flow distillation framework that, for the first time, jointly integrates consistency modeling and adversarial training into the flow-matching paradigm. Leveraging knowledge distillation, our approach enables high-fidelity, highly consistent one-step and few-step text-to-image synthesis. Crucially, it preserves sampling efficiency while substantially improving generation fidelity and stability. Quantitative and qualitative evaluations on CelebA-HQ demonstrate superior performance over state-of-the-art methods. Moreover, zero-shot evaluation on COCO shows significant improvements in text-image alignment and fine-grained detail preservation. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Flow matching has emerged as a promising framework for training generative models, demonstrating impressive empirical performance while offering relative ease of training compared to diffusion-based models. However, this method still requires numerous function evaluations in the sampling process. To address these limitations, we introduce a self-corrected flow distillation method that effectively integrates consistency models and adversarial training within the flow-matching framework. This work is a pioneer in achieving consistent generation quality in both few-step and one-step sampling. Our extensive experiments validate the effectiveness of our method, yielding superior results both quantitatively and qualitatively on CelebA-HQ and zero-shot benchmarks on the COCO dataset. Our implementation is released at https://github.com/VinAIResearch/SCFlow
Problem

Research questions and friction points this paper is trying to address.

Improve flow matching for faster text-to-image generation
Achieve consistent quality in few-step and one-step sampling
Combine consistency models and adversarial training effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-corrected flow distillation method
Integrates consistency and adversarial training
Achieves consistent few-step and one-step generation
🔎 Similar Papers
No similar papers found.
Q
Quan Dao
VinAI Research, Rutgers University
Hao Phung
Hao Phung
CS PhD Student, Cornell University
Generative Models
Trung Dao
Trung Dao
Senior Machine Learning Engineer, Qualcomm Research
Computer VisionGenerative Models
D
Dimitris Metaxas
Rutgers University
A
Anh Tran
VinAI Research