Approximate Vector Set Search: A Bio-Inspired Approach for High-Dimensional Spaces

📅 2024-12-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

247K/year
🤖 AI Summary
To address efficiency bottlenecks in high-dimensional vector set similarity search—caused by combinatorial explosion in pairwise comparisons and the curse of dimensionality—this paper proposes a bio-inspired approximate search framework. It pioneers the adaptation of the fruit fly’s olfactory neural circuit mechanism to this task, integrating sparse binary quantization with Bloom Filters to construct a compact, set-level semantic index that enables efficient membership filtering. The method is both scalable and theoretically interpretable: on million-scale datasets, it achieves >50× speedup over linear scan while maintaining 98.9% recall, significantly outperforming state-of-the-art approximate nearest neighbor (ANN) methods. Key contributions include: (1) a paradigm shift transferring cross-modal neural mechanisms to information retrieval; (2) lightweight index primitives specifically designed for set-structured data; and (3) a novel trade-off between accuracy and efficiency, advancing the frontier of scalable similarity search.

Technology Category

Application Category

📝 Abstract
Vector set search, an underexplored similarity search paradigm, aims to find vector sets similar to a query set. This search paradigm leverages the inherent structural alignment between sets and real-world entities to model more fine-grained and consistent relationships for diverse applications. This task, however, faces more severe efficiency challenges than traditional single-vector search due to the combinatorial explosion of pairings in set-to-set comparisons. In this work, we aim to address the efficiency challenges posed by the combinatorial explosion in vector set search, as well as the curse of dimensionality inherited from single-vector search. To tackle these challenges, we present an efficient algorithm for vector set search, BioVSS (Bio-inspired Vector Set Search). BioVSS simulates the fly olfactory circuit to quantize vectors into sparse binary codes and then designs an index based on the set membership property of the Bloom filter. The quantization and indexing strategy enables BioVSS to efficiently perform vector set search by pruning the search space. Experimental results demonstrate over 50 times speedup compared to linear scanning on million-scale datasets while maintaining a high recall rate of up to 98.9%, making it an efficient solution for vector set search.
Problem

Research questions and friction points this paper is trying to address.

Address efficiency challenges in vector set search
Overcome combinatorial explosion in set-to-set comparisons
Mitigate curse of dimensionality in vector searches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fly olfactory circuit inspired vector quantization
Bloom filter based index for set membership
Efficient search space pruning for speedup