Neighborhood Stability as a Measure of Nearest Neighbor Searchability

📅 2026-02-18

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the lack of effective tools for evaluating whether a dataset is suitable for clustering-based approximate nearest neighbor search (ANNS). It introduces neighborhood stability as a novel searchability metric and proposes two concrete measures: clustering-NSM, which assesses how well clustering quality predicts ANNS accuracy, and point-NSM, which quantifies the intrinsic clusterability of the dataset itself. Both metrics rely solely on neighborhood relationships among points and are independent of specific distance values, making them applicable across various similarity measures such as Euclidean distance and inner product. Experimental results demonstrate that the proposed metrics effectively predict the performance of clustering-based ANNS methods on a given dataset, thereby offering a principled basis for algorithm selection.

Technology Category

Application Category

📝 Abstract

Clustering-based Approximate Nearest Neighbor Search (ANNS) organizes a set of points into partitions, and searches only a few of them to find the nearest neighbors of a query. Despite its popularity, there are virtually no analytical tools to determine the suitability of clustering-based ANNS for a given dataset -- what we call "searchability." To address that gap, we present two measures for flat clusterings of high-dimensional points in Euclidean space. First is Clustering-Neighborhood Stability Measure (clustering-NSM), an internal measure of clustering quality -- a function of a clustering of a dataset -- that we show to be predictive of ANNS accuracy. The second, Point-Neighborhood Stability Measure (point-NSM), is a measure of clusterability -- a function of the dataset itself -- that is predictive of clustering-NSM. The two together allow us to determine whether a dataset is searchable by clustering-based ANNS given only the data points. Importantly, both are functions of nearest neighbor relationships between points, not distances, making them applicable to various distance functions including inner product.

Problem

Research questions and friction points this paper is trying to address.

Approximate Nearest Neighbor Search

Searchability

Clustering

Neighborhood Stability

High-dimensional Data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neighborhood Stability

Approximate Nearest Neighbor Search

Clustering Quality