k-Sample inference via Multimarginal Optimal Transport

📅 2025-01-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the problem of synchronous homogeneity testing for $k geq 2$ high-dimensional distributions. We propose a nonparametric $k$-sample test based on multi-marginal optimal transport (MOT), establishing—for the first time—the asymptotic distribution theory of the MOT statistic under both the null hypothesis (all distributions identical) and the alternative (at least two differ). We design a computationally efficient linear programming approximation to compute critical values and prove the consistency of bootstrap resampling for estimating them. The method achieves a rigorous balance between statistical validity and computational tractability. Empirical evaluation on synthetic data and real-world U.S. cancer incidence data (2004–2020) demonstrates its consistency, high statistical power, and computational efficiency. To our knowledge, this is the first MOT-based framework for high-dimensional multi-distribution comparison with provable theoretical guarantees.

Technology Category

Application Category

📝 Abstract
This paper proposes a Multimarginal Optimal Transport ($MOT$) approach for simultaneously comparing $kgeq 2$ measures supported on finite subsets of $mathbb{R}^d$, $d geq 1$. We derive asymptotic distributions of the optimal value of the empirical $MOT$ program under the null hypothesis that all $k$ measures are same, and the alternative hypothesis that at least two measures are different. We use these results to construct the test of the null hypothesis and provide consistency and power guarantees of this $k$-sample test. We consistently estimate asymptotic distributions using bootstrap, and propose a low complexity linear program to approximate the test cut-off. We demonstrate the advantages of our approach on synthetic and real datasets, including the real data on cancers in the United States in 2004 - 2020.
Problem

Research questions and friction points this paper is trying to address.

Simultaneously comparing k measures using Multimarginal Optimal Transport
Deriving asymptotic distributions for empirical MOT under hypotheses
Constructing consistent k-sample tests with bootstrap estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimarginal Optimal Transport for k-sample comparison
Asymptotic distribution derivation for hypothesis testing
Bootstrap and linear program for efficient approximation
🔎 Similar Papers
No similar papers found.