🤖 AI Summary
This work addresses the challenge of high-dimensional spherical range counting, which is notoriously hindered by the curse of dimensionality, making it difficult to simultaneously achieve accuracy and efficiency. For weighted point sets, we present the first data structure supporting approximate spherical range counting with an arbitrary stretch factor $1+\varepsilon$ ($\varepsilon>0$), maintaining sublinear query time even when the number of points in the ambiguous region, $t_q$, is sublinear. By integrating techniques from approximate range searching, high-dimensional geometric indexing, and query-driven preprocessing, our approach achieves near-linear space complexity $O(n^{1+o(1)})$ and query time $n^{1-\Theta(\varepsilon^4/\log(1/\varepsilon))} + t_q^{\varrho} n^{1-\varrho}$, where $\varrho = \Theta(\varepsilon^2)$. This significantly outperforms existing methods in both theoretical guarantees and practical scalability.
📝 Abstract
We study the following range searching problem in high-dimensional Euclidean spaces: given a finite set $P\subset \mathbb{R}^d$, where each $p\in P$ is assigned a weight $w_p$, and radius $r>0$, we need to preprocess $P$ into a data structure such that when a new query point $q\in \mathbb{R}^d$ arrives, the data structure reports the cumulative weight of points of $P$ within Euclidean distance $r$ from $q$. Solving the problem exactly seems to require space usage that is exponential to the dimension, a phenomenon known as the curse of dimensionality. Thus, we focus on approximate solutions where points up to $(1+\varepsilon)r$ away from $q$ may be taken into account, where $\varepsilon>0$ is an input parameter known during preprocessing. We build a data structure with near-linear space usage, and query time in $n^{1-Θ(\varepsilon^4/\log(1/\varepsilon))}+t_q^{\varrho}\cdot n^{1-\varrho}$, for some $\varrho=Θ(\varepsilon^2)$, where $t_q$ is the number of points of $P$ in the ambiguity zone, i.e., at distance between $r$ and $(1+\varepsilon)r$ from the query $q$. To the best of our knowledge, this is the first data structure with efficient space usage (subquadratic or near-linear for any $\varepsilon>0$) and query time that remains sublinear for any sublinear $t_q$. We supplement our worst-case bounds with a query-driven preprocessing algorithm to build data structures that are well-adapted to the query distribution.