🤖 AI Summary
This work addresses the failure of conventional two-sample tests in extreme imbalance scenarios where reference samples vastly outnumber query samples, rendering data splitting ineffective. To overcome this limitation, the authors propose a novel reference-dependent representation learning approach that leverages abundant reference data to construct a multi-scale family of representations capturing both global and local structures. An uncertainty-guided adaptive weighting mechanism is introduced to achieve high statistical power without partitioning the reference set. By integrating permutation testing with kernelized feature representations, the method rigorously controls Type I error rates and enjoys theoretical guarantees of test consistency. Empirical evaluations demonstrate that the proposed approach significantly outperforms existing methods across multiple benchmark datasets.
📝 Abstract
Data-adaptive two-sample testing assesses if two samples come from the same distribution, using a discrepancy learned from the data (e.g., via kernel-based feature representations). Such methods typically rely on data splitting to decouple learning from testing and control type I error. However, this paradigm is ill-suited to few-shot settings with severe sample-size imbalance: abundant reference samples are available, while only a handful of query samples arrive. In this paper, we show how this imbalance can be leveraged constructively. Using abundant reference data, we learn reference-dependent representations that summarize salient structure of the reference distribution and provide informative signals for detecting departures. We incorporate a collection of representation families that capture both global and local structure, and adaptively weight them using only reference samples via an uncertainty-guided principle. Theoretically, we establish permutation-based type I error control and show consistency of the aggregated test: as the sample sizes grow, the test power converges to one whenever the representation set contains at least one consistent representation. Empirically, our aggregation achieves strong performance across a range of benchmarks while retaining type I error control.