Incomplete U-Statistics of Equireplicate Designs: Berry-Esseen Bound and Efficient Construction

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

Degenerate U-statistics face two key challenges: nonstandard asymptotic distributions and high computational cost. This paper departs from the classical Hoeffding decomposition framework and develops a unified analytical approach grounded in hypergraph theory and combinatorial design. It establishes, for the first time, Berry–Esseen-type convergence rate bounds for degenerate U-statistics whose order diverges, along with rigorous theoretical guarantees for normal approximation. We further propose an efficient algorithm for constructing replicated incomplete U-statistics, circumventing permutation testing entirely. The method applies to arbitrary deterministic designs and substantially improves computational efficiency. Empirical validation is conducted within nonparametric two-sample and independence testing using maximum mean discrepancy (MMD) and Hilbert–Schmidt independence criterion (HSIC). On CIFAR-10, our approach enables permutation-free MMD testing, drastically reducing computational overhead while strictly controlling Type-I error and preserving statistical power.

Technology Category

Application Category

📝 Abstract

U-statistics are a fundamental class of estimators that generalize the sample mean and underpin much of nonparametric statistics. Although extensively studied in both statistics and probability, key challenges remain: their high computational cost - addressed partly through incomplete U-statistics - and their non-standard asymptotic behavior in the degenerate case, which typically requires resampling methods for hypothesis testing. This paper presents a novel perspective on U-statistics, grounded in hypergraph theory and combinatorial designs. Our approach bypasses the traditional Hoeffding decomposition, the main analytical tool in this literature but one highly sensitive to degeneracy. By characterizing the dependence structure of a U-statistic, we derive a Berry-Esseen bound that applies to all incomplete U-statistics of deterministic designs, yielding conditions under which Gaussian limiting distributions can be established even in the degenerate case and when the order diverges. We also introduce efficient algorithms to construct incomplete U-statistics of equireplicate designs, a subclass of deterministic designs that, in certain cases, achieve minimum variance. Finally, we apply our framework to kernel-based tests that use Maximum Mean Discrepancy (MMD) and Hilbert-Schmidt Independence Criterion. In a real data example with CIFAR-10, our permutation-free MMD test delivers substantial computational gains while retaining power and type I error control.

Problem

Research questions and friction points this paper is trying to address.

Addressing computational cost and degeneracy in U-statistics estimation

Developing Berry-Esseen bounds for incomplete U-statistics with deterministic designs

Creating efficient algorithms for equireplicate designs in kernel tests

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using hypergraph theory to bypass traditional Hoeffding decomposition

Deriving Berry-Esseen bound for incomplete U-statistics designs

Developing efficient algorithms for equireplicate designs construction

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

Master Thesis AI-Based Keypoint Refinement for Autonomous Driving

Bosch Group

Hildesheim, NDS, DE

Research Scientist, AI & Systems Co-design (PhD)