Likelihood Ratio Tests by Kernel Gaussian Embedding

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

236K/year
🤖 AI Summary
This paper addresses the nonparametric two-sample testing problem by proposing a likelihood ratio test based on kernel Gaussian embeddings. The method jointly embeds the two distributions into a reproducing kernel Hilbert space (RKHS) via their kernel mean and kernel covariance operators, mapping them to mutually singular Gaussian measures. A regularized likelihood ratio statistic—constructed using the relative entropy between these Gaussian embeddings—is shown to converge to zero under the null hypothesis and diverge to infinity under the alternative, enabling natural hypothesis separation. The approach integrates permutation-based calibration and spectral regularization to ensure finite-sample stability. Theoretically, the test is proven to be consistent and possesses uniform power bounds. Empirically, it significantly outperforms state-of-the-art methods—including MMD and C2ST—in high-dimensional, weak-signal regimes, unifying and extending the kernel embedding-based testing framework.

Technology Category

Application Category

📝 Abstract
We propose a novel kernel-based nonparametric two-sample test, employing the combined use of kernel mean and kernel covariance embedding. Our test builds on recent results showing how such combined embeddings map distinct probability measures to mutually singular Gaussian measures on the kernel's RKHS. Leveraging this result, we construct a test statistic based on the relative entropy between the Gaussian embeddings, i.e. the likelihood ratio. The likelihood ratio is specifically tailored to detect equality versus singularity of two Gaussians, and satisfies a ``$0/infty$" law, in that it vanishes under the null and diverges under the alternative. To implement the test in finite samples, we introduce a regularised version, calibrated by way of permutation. We prove consistency, establish uniform power guarantees under mild conditions, and discuss how our framework unifies and extends prior approaches based on spectrally regularized MMD. Empirical results on synthetic and real data demonstrate remarkable gains in power compared to state-of-the-art methods, particularly in high-dimensional and weak-signal regimes.
Problem

Research questions and friction points this paper is trying to address.

Develops kernel-based nonparametric two-sample test
Detects distribution equality versus singularity
Addresses high-dimensional weak-signal regime testing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Kernel Gaussian embedding for two-sample testing
Likelihood ratio statistic detecting Gaussian singularity
Regularized permutation calibration for finite samples