🤖 AI Summary
This work addresses the limitations of existing subspace collision methods in high-dimensional approximate nearest neighbor (ANN) search, which are insensitive to data distribution and oblivious to query characteristics, leading to imbalanced indexing structures and inefficient resource utilization during querying. To overcome these issues, the authors propose TaCo, the first framework that integrates both data-adaptive and query-aware mechanisms into subspace collision. TaCo employs an information-entropy-balanced data transformation to enable adaptive subspace partitioning and introduces a query-aware dynamic resource allocation strategy. Experimental results demonstrate that TaCo achieves state-of-the-art indexing performance, offering up to 8× faster index construction, 0.6× lower memory consumption, and over 1.5× higher query throughput compared to existing approaches.
📝 Abstract
Approximate Nearest Neighbor Search (ANNS) in high-dimensional Euclidean spaces is a fundamental problem with broad applications. Subspace Collision is a newly proposed ANNS framework that provides a novel paradigm for similarity search and achieves superior indexing and query performance. However, the subspace collision framework remains data-agnostic and query-oblivious, resulting in imbalanced index construction and wasted query overhead. In this paper, we address these limitations from two aspects: first, we design a subspace-oriented data transformation mechanism by averaging the entropies computed over each subspace of the transformed data, which ensures balanced subspace partitioning (in an information theoretical sense) and enables data-adaptive subspace collision; second, we present query-aware and scalable query strategies that dynamically allocate overhead for each query and accelerate collision probing within subspaces. Building on these ideas, we propose a novel data-adaptive and query-aware subspace collision method, abbreviated as TaCo, which achieves efficient and accurate ANN search while maintaining an excellent balance between indexing and query performance. Extensive experiments on real-world datasets demonstrate that, when compared to state-of-the-art subspace collision methods, TaCo achieves up to 8x speedup in indexing and reduces to 0.6x memory footprint, while achieving over 1.5x query throughput. Moreover, TaCo achieves state-of-the-art indexing performance and provides an effective balance between indexing and query efficiency, even when compared with advanced methods beyond the subspace-collision paradigm. This paper was published in SIGMOD 2026.