🤖 AI Summary
This work addresses the challenge of efficient approximate nearest neighbor search in $\ell_p$ metric spaces for $p > 2$ under large-scale data settings. The authors propose a novel approach combining randomized data structures with metric embeddings, leveraging a careful analysis of the geometric properties of $\ell_p$ spaces to design an effective indexing strategy. Their method achieves query time nearly logarithmic in the dataset size—specifically $\text{poly}(d \log n)$—with only polynomial space overhead $\text{poly}(dn)$ and provably controlled approximation error. A key contribution is the first known approximation ratio of $p^{O(1) + \log \log p}$, which significantly improves upon or is incomparable to the best prior results (e.g., from 2019 and 2025), thereby overcoming a longstanding performance bottleneck in high-dimensional $\ell_p$ approximate nearest neighbor search.
📝 Abstract
The Nearest Neighbor Search (NNS) problem asks to design a data structure that preprocesses an $n$-point dataset $X$ lying in a metric space $\mathcal{M}$, so that given a query point $q \in \mathcal{M}$, one can quickly return a point of $X$ minimizing the distance to $q$. The efficiency of such a data structure is evaluated primarily by the amount of space it uses and the time required to answer a query. We focus on the fast query-time regime, which is crucial for modern large-scale applications, where datasets are massive and queries must be processed online, and is often modeled by query time $\text{poly}(d \log n)$. Our main result is such a randomized data structure for NNS in $\ell_p$ spaces, $p>2$, that achieves $p^{O(1) + \log\log p}$ approximation with fast query time and $\text{poly}(dn)$ space. Our data structure improves, or is incomparable to, the state-of-the-art for the fast query-time regime from [Bartal and Gottlieb, TCS 2019] and [Krauthgamer, Petruschka and Sapir, FOCS 2025].