LEMUR: Learned Multi-Vector Retrieval

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a novel approach to address the high computational overhead and latency inherent in multi-vector retrieval, which, despite improving recall, remains impractical for large-scale applications. By formulating multi-vector similarity search as a supervised learning problem, the method employs a single-hidden-layer neural network to map multi-vector representations into a compact single-vector embedding in a latent space. This transformation enables seamless integration with existing efficient approximate nearest neighbor search (ANNS) techniques. To the best of our knowledge, this is the first learnable framework that compresses multi-vector queries into single vectors while preserving high recall. Extensive experiments across diverse multi-vector text and vision models—including ColBERTv2—demonstrate the method’s effectiveness and generalizability, achieving up to an order-of-magnitude speedup in retrieval without sacrificing accuracy.

Technology Category

Application Category

📝 Abstract
Multi-vector representations generated by late interaction models, such as ColBERT, enable superior retrieval quality compared to single-vector representations in information retrieval applications. In multi-vector retrieval systems, both queries and documents are encoded using one embedding for each token, and similarity between queries and documents is measured by the MaxSim similarity measure. However, the improved recall of multi-vector retrieval comes at the expense of significantly increased latency. This necessitates designing efficient approximate nearest neighbor search (ANNS) algorithms for multi-vector search. In this work, we introduce LEMUR, a simple-yet-efficient framework for multi-vector similarity search. LEMUR consists of two consecutive problem reductions: We first formulate multi-vector similarity search as a supervised learning problem that can be solved using a one-hidden-layer neural network. Second, we reduce inference under this model to single-vector similarity search in its latent space, which enables the use of existing single-vector ANNS methods for speeding up retrieval. In addition to performance evaluation on ColBERTv2 embeddings, we evaluate LEMUR on embeddings generated by modern multi-vector text models and multi-vector visual document retrieval models. LEMUR is an order of magnitude faster than earlier multi-vector similarity search methods.
Problem

Research questions and friction points this paper is trying to address.

multi-vector retrieval
approximate nearest neighbor search
information retrieval
latency
similarity search
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-vector retrieval
approximate nearest neighbor search
late interaction models
supervised learning reduction
latent space embedding
🔎 Similar Papers
No similar papers found.
E
Elias Jaasaari
Department of Computer Science, University of Helsinki, Helsinki, Finland
V
Ville Hyvonen
Department of Computer Science, University of Helsinki, Helsinki, Finland
Teemu Roos
Teemu Roos
Professor at University of Helsinki
Machine Learning#UnivHelsinkiCS