๐ค AI Summary
This paper addresses the problem of minimizing statistical discrepancy between two independent samples drawn from the same distribution. Traditional methods incur a discrepancy of $O(sqrt{n})$, which poses a fundamental bottleneck. To overcome this, we propose an online thinning algorithm that dynamically partitions the domain via geometric decomposition and selectively discards an asymptotically negligible fraction of pointsโonly $o(n)$โusing a probability-driven pruning strategy. Our method reduces the maximum discrepancy from $O(sqrt{n})$ to $O(log^{2d} n)$ for arbitrary dimension $d$, achieving near-optimal theoretical guarantees while maintaining implementation simplicity. The key contribution lies in the first integration of online thinning with high-dimensional geometric analysis, requiring no prior knowledge of the underlying distribution. Empirically and theoretically, it substantially outperforms existing offline and random sampling approaches.
๐ Abstract
The discrepancy between two independent samples (X_1,dots,X_n) and (Y_1,dots,Y_n) drawn from the same distribution on $mathbb{R}^d$ typically has order (O(sqrt{n})) even in one dimension. We give a simple online algorithm that reduces the discrepancy to (O(log^{2d} n)) by discarding a small fraction of the points.