Image-Free Timestep Distillation via Continuous-Time Consistency with Trajectory-Sampled Pairs

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

Existing continuous-time consistency distillation methods rely heavily on large-scale training data and substantial computational resources, hindering deployment in resource-constrained settings. To address this, we propose the Trajectory Backward Consistency Model (TBCM), the first timestep-level distillation framework that operates **without image inputs**. TBCM directly extracts latent states from teacher-generated trajectories to construct training pairs, eliminating dependence on external datasets and VAE encoding—thereby bridging the train-inference distribution gap. Built upon the continuous-time consistency framework, TBCM introduces a trajectory sampling strategy and a self-contained knowledge transfer mechanism, significantly improving convergence efficiency and generation fidelity. On MJHQ-30k, TBCM achieves single-step generation with 6.52 FID and 28.08 CLIP score, reduces training time by 40%, and substantially lowers GPU memory consumption—achieving an optimal trade-off between efficiency and perceptual quality.

Technology Category

Application Category

📝 Abstract

Timestep distillation is an effective approach for improving the generation efficiency of diffusion models. The Consistency Model (CM), as a trajectory-based framework, demonstrates significant potential due to its strong theoretical foundation and high-quality few-step generation. Nevertheless, current continuous-time consistency distillation methods still rely heavily on training data and computational resources, hindering their deployment in resource-constrained scenarios and limiting their scalability to diverse domains. To address this issue, we propose Trajectory-Backward Consistency Model (TBCM), which eliminates the dependence on external training data by extracting latent representations directly from the teacher model's generation trajectory. Unlike conventional methods that require VAE encoding and large-scale datasets, our self-contained distillation paradigm significantly improves both efficiency and simplicity. Moreover, the trajectory-extracted samples naturally bridge the distribution gap between training and inference, thereby enabling more effective knowledge transfer. Empirically, TBCM achieves 6.52 FID and 28.08 CLIP scores on MJHQ-30k under one-step generation, while reducing training time by approximately 40% compared to Sana-Sprint and saving a substantial amount of GPU memory, demonstrating superior efficiency without sacrificing quality. We further reveal the diffusion-generation space discrepancy in continuous-time consistency distillation and analyze how sampling strategies affect distillation performance, offering insights for future distillation research. GitHub Link: https://github.com/hustvl/TBCM.

Problem

Research questions and friction points this paper is trying to address.

Reducing dependency on external training data for diffusion model distillation

Improving efficiency and reducing computational resources in consistency models

Addressing distribution gap between training and inference in knowledge transfer

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts latent representations from teacher model trajectory

Eliminates dependence on external training data

Reduces training time and GPU memory usage

🔎 Similar Papers

No similar papers found.