š¤ AI Summary
This study addresses the two-sample homogeneity testing problem by proposing a novel approach based on entropy-regularized optimal transport (EOT) maps. The test statistic is constructed as the squared L² distance between two empirical EOT maps evaluated over the unit sphere, and its non-pivotal null distribution is calibrated via a weighted multiplier bootstrap. This work is the first to incorporate EOT maps into two-sample testing, enabling not only detection of global distributional discrepancies but also diagnosis of their specific patterns. Theoretical analysis establishes a Gaussian quadratic form as the limiting null distribution and derives asymptotic power under local alternatives. Numerical experiments demonstrate that the method achieves accurate size control and high power in finite samples, exhibiting particular sensitivity to location shifts, while both simulations and real-data analyses corroborate its effectiveness and diagnostic capability.
š Abstract
This paper proposes a two-sample homogeneity test based on entropic optimal transport (EOT) maps from a common reference distribution -- the uniform law on the unit ball. The test statistic is the squared $L^2$-distance between the two empirical EOT maps. For fixed entropic regularization parameter, we prove that the population map discrepancy is identifiable, derive a functional central limit theorem for the empirical map difference under the null, and establish the Gaussian quadratic-form null limit. We also prove consistency against fixed alternatives and characterize local asymptotic power under contiguous alternatives. A weighted multiplier bootstrap is proposed to calibrate the non-pivotal null distribution, and its validity is established. Extensive simulations demonstrate that the proposed EOT-map test has reliable finite-sample size control and exhibits competitive power compared with other existing methods. The method is particularly powerful for location alternatives and, beyond a single scalar discrepancy, it provides additional diagnostic information on how the two distributions differ. Finally, a real data application concludes the paper.