🤖 AI Summary
This paper resolves the optimal space complexity for $(1pmvarepsilon)$-approximating the second frequency moment $F_2$ in data streams. For frequency moments of order $p in (1,2]$, we establish the first tight information-theoretic lower bound of $Omega(log(nvarepsilon^2)/varepsilon^2)$ for $F_2$ estimation, valid across the full range $varepsilon = Omega(1/sqrt{n})$. To match this bound, we design a novel truncated randomized linear sketch algorithm. Our results unify and settle the asymptotically tight space complexity for $F_2$ estimation as $Theta(log(nvarepsilon^2)/varepsilon^2)$. This work completes the theoretical characterization of optimal space complexity for frequency moments in the range $p in (1,2]$, thereby closing the last remaining gap in this long-standing line of research.
📝 Abstract
Estimating the second frequency moment of a stream up to (1±ε) multiplicative error requires at most O(logn / ε2) bits of space, due to a seminal result of Alon, Matias, and Szegedy. It is also known that at least Ω(logn + 1/ε2) space is needed. We prove a tight lower bound of Ω(log(n ε2 ) / ε2) for all ε = Ω(1/√n). Note that when ε>n−1/2 + c, where c>0, our lower bound matches the classic upper bound of AMS. For smaller values of ε we also introduce a revised algorithm that improves the classic AMS bound and matches our lower bound. Our lower bound holds also for the more general problem of p-th frequency moment estimation for the range of p∈ (1,2], giving a tight bound in the only remaining range to settle the optimal space complexity of estimating frequency moments.