Non-asymptotic two-sample kernel testing with the spectrally truncated normalized MMD

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited power of nonparametric two-sample tests in high-dimensional or complex distributional settings by proposing the spectrally truncated normalized Maximum Mean Discrepancy (st-nMMD). Built upon embeddings in a reproducing kernel Hilbert space, st-nMMD integrates covariance operator normalization with spectral truncation regularization to substantially enhance test power. The paper establishes, for the first time, a non-asymptotic exponential upper bound for st-nMMD under the null hypothesis, introduces an adaptive hyperparameter tuning algorithm that avoids data splitting, and provides explicit non-asymptotic quantile estimates. Empirical results demonstrate that the method maintains proper Type I error control while achieving superior statistical power and stability under the alternative hypothesis, significantly outperforming existing kernel-based two-sample tests.

Technology Category

Application Category

📝 Abstract
Kernel methods provide a flexible and powerful framework for nonparametric statistical testing by embedding probability distributions into a reproducing kernel Hilbert space (RKHS). In this work, we study the kernel two-sample testing problem and focus on a normalized version of the Maximum Mean Discrepancy (MMD) as a test statistic, which scales the discrepancy by the within-group covariance operator to account for data variability. This normalization has been shown to improve test power in both theoretical and empirical settings. Because this normalization requires regularization, we study the non-asymptotic properties of the spectrally truncated normalized MMD (st-nMMD) and derive an exponential upper bound under the null hypothesis. Thanks to this result we propose a sharp and explicit upper bound for the corresponding non-asymptotic quantile, along with a data-adaptive estimator. We further propose an algorithm to tune the hyperparameters involved in the quantile estimation, including the truncation level, without requiring data splitting. We demonstrate the performance of the st-nMMD through numerical experiments under both the null and alternative hypotheses.
Problem

Research questions and friction points this paper is trying to address.

two-sample testing
normalized MMD
non-asymptotic analysis
kernel methods
spectral truncation
Innovation

Methods, ideas, or system contributions that make the work stand out.

spectrally truncated normalized MMD
non-asymptotic analysis
kernel two-sample test
data-adaptive quantile estimation
reproducing kernel Hilbert space
🔎 Similar Papers
No similar papers found.
P
Perrine Lacroix
Nantes Université, CNRS, Laboratoire de Mathématiques Jean Leray, LMJL, UMR 6629, F-44000 Nantes, France; Laboratoire de Biologie et Modélisation de la Cellule, École Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
B
Bertrand Michel
Nantes Université, École Centrale Nantes, CNRS, Laboratoire de Mathématiques Jean Leray, LMJL, UMR 6629, F-44000 Nantes, France
Franck Picard
Franck Picard
LBMC - ENS Lyon - CNRS
statisticsmachine learningcomputational biologygenomicssingle-cell biology
V
Vincent Rivoirard
CEREMADE, CNRS, Université Paris-Dauphine, Université PSL, 75016 Paris, FRANCE; Université Paris-Saclay, CNRS, Inria, LMO, 91405 Orsay, FRANCE