🤖 AI Summary
Single-step diffusion models lack a unified theoretical foundation and modular, decoupled design principles. Method: This paper introduces the first systematic shortcut model design paradigm, establishing a general theoretical framework that rigorously derives and validates the validity of flow-path truncation modeling. It decouples three core components—noise scheduling, training objective, and network architecture—enabling modular innovation without relying on pretraining, knowledge distillation, or curriculum learning. The approach supports end-to-end single-step training and classifier-free guidance. Contribution/Results: On ImageNet-256×256, it achieves FID<sub>50k</sub> = 2.85, setting a new state-of-the-art for single-step diffusion models. This work establishes a novel paradigm for efficient generative modeling that combines theoretical rigor with engineering flexibility.
📝 Abstract
Recent advances in few-step diffusion models have demonstrated their efficiency and effectiveness by shortcutting the probabilistic paths of diffusion models, especially in training one-step diffusion models from scratch (a.k.a. shortcut models). However, their theoretical derivation and practical implementation are often closely coupled, which obscures the design space. To address this, we propose a common design framework for representative shortcut models. This framework provides theoretical justification for their validity and disentangles concrete component-level choices, thereby enabling systematic identification of improvements. With our proposed improvements, the resulting one-step model achieves a new state-of-the-art FID50k of 2.85 on ImageNet-256x256 under the classifier-free guidance setting. Remarkably, the model requires no pre-training, distillation, or curriculum learning. We believe our work lowers the barrier to component-level innovation in shortcut models and facilitates principled exploration of their design space.