Fast and light-weight energy statistics using the extit{R} package extsf{Rfast}

📅 2025-01-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational complexity and memory overhead of energy statistics—such as distance variance, covariance, correlation, and energy distance—in high-dimensional, large-scale data, this paper proposes a lightweight, efficient algorithmic framework. The method leverages vectorized distance compression, block-wise iteration, and memory reuse to avoid explicit construction and storage of the full distance matrix. It tightly integrates Rfast’s C++ kernel within R, achieving sub-quadratic time complexity and linear space complexity. For the first time, it enables real-time computation of computationally intensive statistics—including partial distance correlation and multivariate energy distance—while preserving strict statistical consistency. On datasets with 10,000 samples and 1,000 dimensions, the implementation achieves over 100× speedup and 90% memory reduction compared to state-of-the-art methods, with numerical precision identical to classical implementations. The framework has been successfully deployed in distribution homogeneity testing and nonlinear independence detection.

Technology Category

Application Category

📝 Abstract
Energy statistics, also known as $mathcal{varepsilon}$-statistics, are functions of distances between statistical observations. This class of functions has enabled the development of non-linear statistical concepts, such as distance variance, distance covariance, and distance correlation. However, the computational burden associated with $mathcal{varepsilon}$-statistics is substantial, particularly when the data reside in multivariate space. To address this challenge, we have developed a method to significantly reduce memory requirements and accelerate computations, thereby facilitating the analysis of large data sets. The following cases are demonstrated: univariate and multivariate distance variance, distance covariance, partial distance correlation, energy distance, and hypothesis testing for the equality of univariate distributions.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational burden of energy statistics
Accelerating multivariate distance-based statistical analysis
Enabling large dataset analysis with efficient memory usage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reduces memory usage for energy statistics
Accelerates computations in multivariate space
Enables large dataset analysis efficiently
🔎 Similar Papers
No similar papers found.