Likelihood Ratio Tests by Kernel Gaussian Embedding

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

This paper addresses the nonparametric two-sample testing problem by proposing a likelihood ratio test based on kernel Gaussian embeddings. The method jointly embeds the two distributions into a reproducing kernel Hilbert space (RKHS) via their kernel mean and kernel covariance operators, mapping them to mutually singular Gaussian measures. A regularized likelihood ratio statistic—constructed using the relative entropy between these Gaussian embeddings—is shown to converge to zero under the null hypothesis and diverge to infinity under the alternative, enabling natural hypothesis separation. The approach integrates permutation-based calibration and spectral regularization to ensure finite-sample stability. Theoretically, the test is proven to be consistent and possesses uniform power bounds. Empirically, it significantly outperforms state-of-the-art methods—including MMD and C2ST—in high-dimensional, weak-signal regimes, unifying and extending the kernel embedding-based testing framework.

Technology Category

Application Category

📝 Abstract

We propose a novel kernel-based nonparametric two-sample test, employing the combined use of kernel mean and kernel covariance embedding. Our test builds on recent results showing how such combined embeddings map distinct probability measures to mutually singular Gaussian measures on the kernel's RKHS. Leveraging this result, we construct a test statistic based on the relative entropy between the Gaussian embeddings, i.e. the likelihood ratio. The likelihood ratio is specifically tailored to detect equality versus singularity of two Gaussians, and satisfies a ``$0/infty$" law, in that it vanishes under the null and diverges under the alternative. To implement the test in finite samples, we introduce a regularised version, calibrated by way of permutation. We prove consistency, establish uniform power guarantees under mild conditions, and discuss how our framework unifies and extends prior approaches based on spectrally regularized MMD. Empirical results on synthetic and real data demonstrate remarkable gains in power compared to state-of-the-art methods, particularly in high-dimensional and weak-signal regimes.

Problem

Research questions and friction points this paper is trying to address.

Develops kernel-based nonparametric two-sample test

Detects distribution equality versus singularity

Addresses high-dimensional weak-signal regime testing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Kernel Gaussian embedding for two-sample testing

Likelihood ratio statistic detecting Gaussian singularity

Regularized permutation calibration for finite samples

🔎 Similar Papers

Towards One Model for Classical Dimensionality Reduction: A Probabilistic Perspective on UMAP and t-SNE