🤖 AI Summary
This work establishes the first rigorous theoretical foundation for neural network–based flow matching, addressing the lack of guarantees regarding convergence, generalization, and generation quality. Focusing on over-parameterized two-layer ReLU neural networks that model conditional velocity fields, the study analyzes the Wasserstein error of samples generated by the induced flow under gradient descent optimization and provides the first convergence and generalization bounds for flow matching. Furthermore, it introduces a novel generalization theory for multi-task representation learning applicable to unbounded losses. Empirical evaluations on both synthetic data and real-world image benchmarks corroborate the theoretical predictions, demonstrating that the method achieves strong convergence properties alongside high-quality sample generation.
📝 Abstract
In this work, we develop theoretical foundation for flow matching with neural-network-parameterized conditional velocity fields. We establish convergence guarantees for gradient descent in the over-parameterized 2-layered ReLU neural network regime. We derive generalization bounds for the conditional velocity-field matching objective. Building on these results, we provide Wasserstein-distance guarantees for the samples generated by the induced flow. Our analysis is based on generalization bound for multi-task representation learning with unbounded losses, which may be of independent interest beyond flow-based generative modeling. These theoretical results are validated through extensive experiments on both synthetic and real-world image benchmarks.