ASH: Asymmetric Scalar Hashing With Learned Dimensionality Reduction for High-Fidelity Vector Quantization

📅 2026-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of balancing compression ratio, accuracy, and efficiency in approximate nearest neighbor search by proposing a data-driven asymmetric scalar hashing framework. The method applies learnable orthogonal projection followed by scalar quantization to database vectors, while query vectors retain their original dimensionality. An asymmetric encoding–decoding architecture combined with SIMD acceleration enables highly efficient similarity computation. Under identical compression ratios, the proposed approach substantially outperforms existing additive and scalar quantization methods, achieving state-of-the-art recall and search speed simultaneously across multiple benchmark datasets. Furthermore, it offers fast training and encoding, making it well-suited for practical deployment.
📝 Abstract
For a long time, additive quantizers, such as product quantization, have been considered the gold standard in terms of accuracy and efficiency. Recently, scalar quantization has re-emerged from the depths of history with a new wave of data-agnostic techniques. Inscribed in this general framework, we turn our attention to data-driven methods, showing that new highs in recall and speed can be achieved by reducing the number of dimensions while increasing the bitrate per dimension. Critically, this dimensionality reduction needs to be learned from data to be successful. We present ASH (Asymmetric Scalar Hashing), a data-driven encoder-decoder framework that applies dimensionality reduction to database vectors via a learned orthonormal projection, followed by scalar quantization, while keeping queries in their original form. This asymmetric design enables higher accuracy than the best additive and scalar quantizers at iso-compression, while admitting highly efficient similarity computations via SIMD operations. ASH has short learning and encoding times, making it attractive for real-world deployment. Extensive experiments on a variety of datasets demonstrate that ASH achieves state-of-the-art ANN recall and speeds across all compression regimes.
Problem

Research questions and friction points this paper is trying to address.

vector quantization
dimensionality reduction
approximate nearest neighbor
scalar quantization
high-fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

asymmetric scalar hashing
learned dimensionality reduction
vector quantization
orthonormal projection
SIMD acceleration
🔎 Similar Papers
No similar papers found.