🤖 AI Summary
Traditional flow-mapping distillation relies on external datasets, leading to representation mismatch between the teacher model and the data. This work proposes the first fully data-free flow graph distillation framework: it samples directly from a prior distribution—eliminating data dependency—and introduces an error-correction mechanism alongside a self-correcting learning paradigm to ensure high-fidelity prediction trajectories. The method requires only sampling trajectories from a pre-trained teacher model to achieve end-to-end flow graph compression. Evaluated on ImageNet at resolutions of 256×256 and 512×512, it achieves single-step sampling FID scores of 1.45 and 1.49, respectively—surpassing all existing data-dependent flow distillation approaches. This work establishes a new paradigm for efficient and robust acceleration of generative flow models.
📝 Abstract
State-of-the-art flow models achieve remarkable quality but require slow, iterative sampling. To accelerate this, flow maps can be distilled from pre-trained teachers, a procedure that conventionally requires sampling from an external dataset. We argue that this data-dependency introduces a fundamental risk of Teacher-Data Mismatch, as a static dataset may provide an incomplete or even misaligned representation of the teacher's full generative capabilities. This leads us to question whether this reliance on data is truly necessary for successful flow map distillation. In this work, we explore a data-free alternative that samples only from the prior distribution, a distribution the teacher is guaranteed to follow by construction, thereby circumventing the mismatch risk entirely. To demonstrate the practical viability of this philosophy, we introduce a principled framework that learns to predict the teacher's sampling path while actively correcting for its own compounding errors to ensure high fidelity. Our approach surpasses all data-based counterparts and establishes a new state-of-the-art by a significant margin. Specifically, distilling from SiT-XL/2+REPA, our method reaches an impressive FID of 1.45 on ImageNet 256x256, and 1.49 on ImageNet 512x512, both with only 1 sampling step. We hope our work establishes a more robust paradigm for accelerating generative models and motivates the broader adoption of flow map distillation without data.