Demystifying Transition Matching: When and Why It Can Beat Flow Matching

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the conditions and mechanisms under which Transition Matching (TM) outperforms Flow Matching (FM). Focusing on target distributions exhibiting modality separation and large variance—e.g., Gaussian mixtures—we propose a TM analysis framework based on stochastic differential updates of latent variables. We theoretically establish that, within a finite number of steps, TM achieves a smaller KL divergence to the target, exactly preserves the target covariance, and converges faster than FM. Through rigorous KL divergence bounds and convergence analysis, we identify the root causes of TM’s superior sampling efficiency and generation quality. Experiments confirm these advantages in image and video generation: TM attains significantly better FID and LPIPS scores with fewer sampling steps. Our core contribution is the first systematic derivation of sufficient conditions for TM’s superiority over FM, along with a formal characterization of its covariance preservation property and accelerated convergence mechanism.

Technology Category

Application Category

📝 Abstract
Flow Matching (FM) underpins many state-of-the-art generative models, yet recent results indicate that Transition Matching (TM) can achieve higher quality with fewer sampling steps. This work answers the question of when and why TM outperforms FM. First, when the target is a unimodal Gaussian distribution, we prove that TM attains strictly lower KL divergence than FM for finite number of steps. The improvement arises from stochastic difference latent updates in TM, which preserve target covariance that deterministic FM underestimates. We then characterize convergence rates, showing that TM achieves faster convergence than FM under a fixed compute budget, establishing its advantage in the unimodal Gaussian setting. Second, we extend the analysis to Gaussian mixtures and identify local-unimodality regimes in which the sampling dynamics approximate the unimodal case, where TM can outperform FM. The approximation error decreases as the minimal distance between component means increases, highlighting that TM is favored when the modes are well separated. However, when the target variance approaches zero, each TM update converges to the FM update, and the performance advantage of TM diminishes. In summary, we show that TM outperforms FM when the target distribution has well-separated modes and non-negligible variances. We validate our theoretical results with controlled experiments on Gaussian distributions, and extend the comparison to real-world applications in image and video generation.
Problem

Research questions and friction points this paper is trying to address.

Analyzing when Transition Matching outperforms Flow Matching in generative models
Explaining why TM achieves lower KL divergence with stochastic latent updates
Identifying conditions where TM excels on well-separated multimodal distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transition Matching uses stochastic latent updates preserving covariance
TM achieves faster convergence than FM under fixed compute
TM outperforms FM with well-separated modes and non-zero variance