Tight Bounds for Low-Error Frequency Moment Estimation and the Power of Multiple Passes

📅 2025-09-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper resolves the optimal space complexity for estimating the second frequency moment $F_2$ in data streams under small additive error $varepsilon < 1/sqrt{n}$. To bridge the long-standing gap between the $Omega(n)$ lower bound and the $O(n log n)$ upper bound, we first fully characterize the two-party communication complexity of set intersection size under additive error, establishing a tight $Omega(n log n)$ lower bound for one-way communication in this regime. Leveraging this insight, we introduce a novel multi-pass paradigm and design a two-pass algorithm that reconstructs the stream histogram exactly with high probability using only $O(n log log n)$ bits of space. Our main result is a tight space complexity characterization for $F_2$ estimation across the full error range: $Thetaig(min(n, 1/varepsilon^2) cdot (1 + |log(varepsilon^2 n)|)ig)$. This is the first asymptotic separation between single-pass and constant-pass algorithms for $F_2$.

Technology Category

Application Category

📝 Abstract
Estimating the second frequency moment $F_2$ of a data stream up to a $(1 pm varepsilon)$ factor is a central problem in the streaming literature. For errors $varepsilon > Ω(1/sqrt{n})$, the tight bound $Θleft(log(varepsilon^2 n)/varepsilon^2 ight)$ was recently established by Braverman and Zamir. In this work, we complete the picture by resolving the remaining regime of small error, $varepsilon < 1/sqrt{n}$, showing that the optimal space complexity is $Θleft( minleft(n, frac{1}{varepsilon^2} ight) cdot left(1 + left| log(varepsilon^2 n) ight| ight) ight)$ bits for all $varepsilon geq 1/n^2$, assuming a sufficiently large universe. This closes the gap between the best known $Ω(n)$ lower bound and the straightforward $O(n log n)$ upper bound in that range, and shows that essentially storing the entire stream is necessary for high-precision estimation. To derive this bound, we fully characterize the two-party communication complexity of estimating the size of a set intersection up to an arbitrary additive error $varepsilon n$. In particular, we prove a tight $Ω(n log n)$ lower bound for one-way communication protocols when $varepsilon < n^{-1/2-Ω(1)}$, in contrast to classical $O(n)$-bit protocols that use two-way communication. Motivated by this separation, we present a two-pass streaming algorithm that computes the exact histogram of a stream with high probability using only $O(n log log n)$ bits of space, in contrast to the $Θ(n log n)$ bits required in one pass even to approximate $F_2$ with small error. This yields the first asymptotic separation between one-pass and $O(1)$-passes space complexity for small frequency moment estimation.
Problem

Research questions and friction points this paper is trying to address.

Determining optimal space complexity for low-error F2 estimation
Characterizing communication complexity of set intersection estimation
Establishing separation between one-pass and multi-pass algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-pass streaming algorithm for exact histogram
Tight space complexity bounds for small error
Characterizing two-party communication complexity for set intersection
🔎 Similar Papers
No similar papers found.