ColBERTSaR: Sparsified ColBERT Index via Product Quantization

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the efficiency bottlenecks of the ColBERT model, which stem from its large dense index and the high computational cost of token gathering and decompression during query processing. To overcome these limitations, the paper introduces embedding quantization into ColBERT for the first time, employing product quantization to compress token embeddings. This approach constructs a compact inverted index that is structurally equivalent to learned sparse retrieval, preserving the effectiveness of ColBERT’s MaxSim scoring mechanism while substantially improving efficiency. The resulting method achieves a 50%–70% reduction in index size compared to the current state-of-the-art 1-bit PLAID model, with comparable retrieval effectiveness.

📝 Abstract

While ColBERT is an effective neural retrieval architecture, it requires a heavy index structure to support candidate set retrieval based on approximated token embeddings, gathering and decompressing document token embeddings, and applying the MaxSim operation. Indexes in PLAID and similar ColBERT implementations require five to ten times the disk storage of the original raw text, which limits their scalability. Furthermore, prior work has identified that the gathering and decompression stages are the primary inefficiencies at query time. Limiting the number of document tokens that must be gathered by thresholding and score approximation does not eliminate the need for the entire index to support ad hoc queries. In this work, we propose an embedding quantization approach that turns a ColBERT index into a true inverted index. We show that, theoretically, ColBERT with embedding quantization is equivalent to learned-sparse retrieval except for the scoring mechanism. Empirically, we demonstrate that our index is 50-70% smaller than a one-bit PLAID index while retaining retrieval effectiveness.

Problem

Research questions and friction points this paper is trying to address.

ColBERT

index compression

neural retrieval

query efficiency

storage scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

ColBERT

product quantization

sparse retrieval