🤖 AI Summary
In conditional flow matching (CFM), standard minibatch optimal transport (OT) ignores conditioning information, causing conditional prior shift during training—where the prior becomes biased by conditions—while sampling at test time uses an unbiased prior, leading to train-test mismatch. This work first identifies this fundamental mechanism flaw and proposes Conditional Optimal Transport (C²OT): a differentiable, condition-aware OT formulation that incorporates a learnable, condition-weighted term into the transport cost matrix to explicitly align conditional distributions, supporting both discrete and continuous conditioning variables. C²OT integrates seamlessly into the CFM framework without requiring auxiliary networks or additional regularization. Evaluated on 8gaussians-to-moons, CIFAR-10, and multi-scale ImageNet generation, C²OT consistently outperforms baselines, achieving superior sample quality and sampling efficiency under limited function evaluation budgets.
📝 Abstract
Minibatch optimal transport coupling straightens paths in unconditional flow matching. This leads to computationally less demanding inference as fewer integration steps and less complex numerical solvers can be employed when numerically solving an ordinary differential equation at test time. However, in the conditional setting, minibatch optimal transport falls short. This is because the default optimal transport mapping disregards conditions, resulting in a conditionally skewed prior distribution during training. In contrast, at test time, we have no access to the skewed prior, and instead sample from the full, unbiased prior distribution. This gap between training and testing leads to a subpar performance. To bridge this gap, we propose conditional optimal transport C^2OT that adds a conditional weighting term in the cost matrix when computing the optimal transport assignment. Experiments demonstrate that this simple fix works with both discrete and continuous conditions in 8gaussians-to-moons, CIFAR-10, ImageNet-32x32, and ImageNet-256x256. Our method performs better overall compared to the existing baselines across different function evaluation budgets. Code is available at https://hkchengrex.github.io/C2OT