PDX: A Data Layout for Vector Similarity Search

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low similarity-search efficiency and limited acceleration from approximation algorithms under frequent updates in vector databases, this paper proposes PDX—a vertical data layout that partitions vectors by dimension, enabling automatic scalar vectorization and tight-loop multi-vector computation within blocks. We introduce PDX-BOND, the first preprocessing-free dynamic dimension pruning strategy, which restores and enhances the effectiveness of pruning algorithms (e.g., ADSampling, BSA) in approximate search, achieving 2–7× speedup. Compared to SIMD-optimized horizontal layouts, PDX delivers an average 40% speedup in both exact and approximate search while natively supporting high-frequency insertions, deletions, and updates. The core innovations are (1) a dimension-wise vertical blocking architecture and (2) the PDX-BOND pruning mechanism—jointly ensuring high performance, broad applicability, and update friendliness.

Technology Category

Application Category

📝 Abstract
We propose Partition Dimensions Across (PDX), a data layout for vectors (e.g., embeddings) that, similar to PAX [6], stores multiple vectors in one block, using a vertical layout for the dimensions (Figure 1). PDX accelerates exact and approximate similarity search thanks to its dimension-by-dimension search strategy that operates on multiple-vectors-at-a-time in tight loops. It beats SIMD-optimized distance kernels on standard horizontal vector storage (avg 40% faster), only relying on scalar code that gets auto-vectorized. We combined the PDX layout with recent dimension-pruning algorithms ADSampling [19] and BSA [52] that accelerate approximate vector search. We found that these algorithms on the horizontal vector layout can lose to SIMD-optimized linear scans, even if they are SIMD-optimized. However, when used on PDX, their benefit is restored to 2-7x. We find that search on PDX is especially fast if a limited number of dimensions has to be scanned fully, which is what the dimension-pruning approaches do. We finally introduce PDX-BOND, an even more flexible dimension-pruning strategy, with good performance on exact search and reasonable performance on approximate search. Unlike previous pruning algorithms, it can work on vector data"as-is"without preprocessing; making it attractive for vector databases with frequent updates.
Problem

Research questions and friction points this paper is trying to address.

Proposes PDX layout for efficient vector similarity search.
Enhances search speed with dimension-by-dimension strategy.
Introduces PDX-BOND for flexible, preprocessing-free dimension pruning.
Innovation

Methods, ideas, or system contributions that make the work stand out.

PDX layout accelerates vector similarity search
Combines PDX with dimension-pruning algorithms
PDX-BOND enables flexible dimension-pruning without preprocessing
🔎 Similar Papers
No similar papers found.
L
Leonardo Kuffo
CWI, Amsterdam, The Netherlands
E
Elena Krippner
CWI, Amsterdam, The Netherlands
Peter Boncz
Peter Boncz
Professor, VU University Amsterdam & CWI
Database SystemsColumn StoresVectorized ExecutionGraph/RDF querying