ColBERTSaR: Sparsified ColBERT Index via Product Quantization

πŸ“… 2026-06-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

203K/year
πŸ€– AI Summary
This work addresses the efficiency bottlenecks of the ColBERT model, which stem from its large dense index and the high computational cost of token gathering and decompression during query processing. To overcome these limitations, the paper introduces embedding quantization into ColBERT for the first time, employing product quantization to compress token embeddings. This approach constructs a compact inverted index that is structurally equivalent to learned sparse retrieval, preserving the effectiveness of ColBERT’s MaxSim scoring mechanism while substantially improving efficiency. The resulting method achieves a 50%–70% reduction in index size compared to the current state-of-the-art 1-bit PLAID model, with comparable retrieval effectiveness.
πŸ“ Abstract
While ColBERT is an effective neural retrieval architecture, it requires a heavy index structure to support candidate set retrieval based on approximated token embeddings, gathering and decompressing document token embeddings, and applying the MaxSim operation. Indexes in PLAID and similar ColBERT implementations require five to ten times the disk storage of the original raw text, which limits their scalability. Furthermore, prior work has identified that the gathering and decompression stages are the primary inefficiencies at query time. Limiting the number of document tokens that must be gathered by thresholding and score approximation does not eliminate the need for the entire index to support ad hoc queries. In this work, we propose an embedding quantization approach that turns a ColBERT index into a true inverted index. We show that, theoretically, ColBERT with embedding quantization is equivalent to learned-sparse retrieval except for the scoring mechanism. Empirically, we demonstrate that our index is 50-70% smaller than a one-bit PLAID index while retaining retrieval effectiveness.
Problem

Research questions and friction points this paper is trying to address.

ColBERT
index compression
neural retrieval
query efficiency
storage scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

ColBERT
product quantization
sparse retrieval
inverted index
neural retrieval
πŸ”Ž Similar Papers
No similar papers found.