🤖 AI Summary
Traditional optimal transport suffers from high computational costs and limited interpretability when applied to large-scale data. This work proposes an Optimal Mixture Transport (OMT) framework that elevates the transport unit from individual samples to subpopulation-level mixture models. By modeling subpopulations with exponential family distributions, the problem is reformulated as a strictly biconvex optimization, yielding—for the first time—a mixture transport method with guarantees of a unique global solution and stability. Notably, the computational complexity of OMT depends only on the number of mixture components and is decoupled from the sample size. Experiments demonstrate that OMT achieves superior efficiency, stability, and interpretability across synthetic data, image tasks, and large-scale single-cell RNA sequencing applications.
📝 Abstract
Optimal transport (OT) provides a principled framework for mapping between probability distributions. Despite extensive progress, applying OT to large-scale data remains computationally demanding, and the resulting pointwise transport plans are often difficult to interpret. We introduce Optimal Mixture Transport (OMT), a scalable framework that shifts the transport paradigm from individual samples to mixtures of subpopulations, reformulating the transport problem as a strictly biconvex optimization with a unique global minimizer. We further establish theoretical guarantees on the stability of the OMT map, showing that bounded perturbations of the underlying distributions lead to bounded changes in the transport plan. By formulating subpopulations as exponential-family distributions, OMT decouples computational complexity from the sample size, scaling solely with the number of mixture components. We demonstrate the effectiveness and practicality of OMT on a wide range of synthetic benchmarks and real-world datasets, including image data and large-scale single-cell RNA sequencing measurements.