🤖 AI Summary
In kernel two-sample testing via Maximum Mean Discrepancy (MMD), the null distribution lacks a closed-form expression, necessitating computationally expensive permutation or bootstrap calibration. To address this, we propose martingale-based MMD (mMMD), the first method to model the estimated squared MMD as a martingale process. Under the null hypothesis, mMMD is asymptotically standard normal, enabling analytic p-value computation without resampling. Retaining quadratic time complexity O(n²), mMMD achieves statistical consistency and high computational efficiency: its test power approaches that of permutation testing in large samples, while reducing computational cost from O(Bn²) to O(n²), where B is the number of permutations. This bridges theoretical rigor and practical deployability—offering both asymptotic guarantees and scalable inference for real-world applications.
📝 Abstract
The Maximum Mean Discrepancy (MMD) is a widely used multivariate distance metric for two-sample testing. The standard MMD test statistic has an intractable null distribution typically requiring costly resampling or permutation approaches for calibration. In this work we leverage a martingale interpretation of the estimated squared MMD to propose martingale MMD (mMMD), a quadratic-time statistic which has a limiting standard Gaussian distribution under the null. Moreover we show that the test is consistent against any fixed alternative and for large sample sizes, mMMD offers substantial computational savings over the standard MMD test, with only a minor loss in power.