Thinning to improve two-sample discrepancy

๐Ÿ“… 2025-06-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses the problem of minimizing statistical discrepancy between two independent samples drawn from the same distribution. Traditional methods incur a discrepancy of $O(sqrt{n})$, which poses a fundamental bottleneck. To overcome this, we propose an online thinning algorithm that dynamically partitions the domain via geometric decomposition and selectively discards an asymptotically negligible fraction of pointsโ€”only $o(n)$โ€”using a probability-driven pruning strategy. Our method reduces the maximum discrepancy from $O(sqrt{n})$ to $O(log^{2d} n)$ for arbitrary dimension $d$, achieving near-optimal theoretical guarantees while maintaining implementation simplicity. The key contribution lies in the first integration of online thinning with high-dimensional geometric analysis, requiring no prior knowledge of the underlying distribution. Empirically and theoretically, it substantially outperforms existing offline and random sampling approaches.

Technology Category

Application Category

๐Ÿ“ Abstract
The discrepancy between two independent samples (X_1,dots,X_n) and (Y_1,dots,Y_n) drawn from the same distribution on $mathbb{R}^d$ typically has order (O(sqrt{n})) even in one dimension. We give a simple online algorithm that reduces the discrepancy to (O(log^{2d} n)) by discarding a small fraction of the points.
Problem

Research questions and friction points this paper is trying to address.

Reducing discrepancy between two independent samples
Online algorithm for thinning sample points
Achieving logarithmic discrepancy by discarding points
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online algorithm reduces discrepancy
Discards fraction of points
Achieves O(log^2d n) discrepancy
๐Ÿ”Ž Similar Papers
No similar papers found.