🤖 AI Summary
This paper addresses the problem of synchronous homogeneity testing for $k geq 2$ high-dimensional distributions. We propose a nonparametric $k$-sample test based on multi-marginal optimal transport (MOT), establishing—for the first time—the asymptotic distribution theory of the MOT statistic under both the null hypothesis (all distributions identical) and the alternative (at least two differ). We design a computationally efficient linear programming approximation to compute critical values and prove the consistency of bootstrap resampling for estimating them. The method achieves a rigorous balance between statistical validity and computational tractability. Empirical evaluation on synthetic data and real-world U.S. cancer incidence data (2004–2020) demonstrates its consistency, high statistical power, and computational efficiency. To our knowledge, this is the first MOT-based framework for high-dimensional multi-distribution comparison with provable theoretical guarantees.
📝 Abstract
This paper proposes a Multimarginal Optimal Transport ($MOT$) approach for simultaneously comparing $kgeq 2$ measures supported on finite subsets of $mathbb{R}^d$, $d geq 1$. We derive asymptotic distributions of the optimal value of the empirical $MOT$ program under the null hypothesis that all $k$ measures are same, and the alternative hypothesis that at least two measures are different. We use these results to construct the test of the null hypothesis and provide consistency and power guarantees of this $k$-sample test. We consistently estimate asymptotic distributions using bootstrap, and propose a low complexity linear program to approximate the test cut-off. We demonstrate the advantages of our approach on synthetic and real datasets, including the real data on cancers in the United States in 2004 - 2020.