From Two Sample Testing to Singular Gaussian Discrimination

📅 2025-05-07

📈 Citations: 0

✨ Influential: 0

career value

259K/year

🤖 AI Summary

This paper addresses the nonparametric two-sample testing problem for probability measures on general separable compact metric spaces. We reformulate the problem as a singularity test between corresponding Gaussian measures in a reproducing kernel Hilbert space (RKHS): Gaussian distributions are constructed via kernel mean and covariance embeddings, and we establish—for the first time—that this embedding exhibits a “blessing of dimensionality,” wherein minute distributional discrepancies are significantly amplified in the RKHS. Leveraging the Feldman–Hájek theorem and functional analytic techniques, we rigorously prove that any two distinct probability measures induce almost orthogonal Gaussian embeddings. This framework yields a novel paradigm for high-dimensional two-sample testing that is information-theoretically superior and computationally efficient, overcoming the classical curse of dimensionality wherein statistical power deteriorates with increasing dimension.

Technology Category

Application Category

📝 Abstract

We establish that testing for the equality of two probability measures on a general separable and compact metric space is equivalent to testing for the singularity between two corresponding Gaussian measures on a suitable Reproducing Kernel Hilbert Space. The corresponding Gaussians are defined via the notion of kernel mean and covariance embedding of a probability measure. Discerning two singular Gaussians is fundamentally simpler from an information-theoretic perspective than non-parametric two-sample testing, particularly in high-dimensional settings. Our proof leverages the Feldman-Hajek criterion for singularity/equivalence of Gaussians on Hilbert spaces, and shows that discrepancies between distributions are heavily magnified through their corresponding Gaussian embeddings: at a population level, distinct probability measures lead to essentially separated Gaussian embeddings. This appears to be a new instance of the blessing of dimensionality that can be harnessed for the design of efficient inference tools in great generality.

Problem

Research questions and friction points this paper is trying to address.

Testing equality of two probability measures on metric spaces

Comparing singularity of Gaussian measures in Hilbert spaces

Magnifying distribution discrepancies via Gaussian embeddings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Testing equality via Gaussian measure singularity

Leveraging kernel mean and covariance embeddings

Using Feldman-Hajek criterion for singularity analysis

🔎 Similar Papers

Learning Deep Kernels for Non-Parametric Independence Testing