jina-reranker-v3: Last but Not Late Interaction for Document Reranking

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the trade-off between efficiency and effectiveness in multilingual document re-ranking, this paper proposes a compact causal self-attention re-ranker. The architecture models query–document interaction in a unified context window using a “last-but-not-late” strategy—preserving full semantic context without early truncation while avoiding the redundant autoregressive decoding overhead of generative models. It further incorporates multilingual pretraining and a context-sensitive end-token embedding extraction mechanism to strengthen cross-lingual semantic alignment. Evaluated on the BEIR multilingual benchmark, the model achieves a state-of-the-art nDCG@10 of 61.94 with only one-tenth the parameters of leading generative listwise re-rankers. This yields substantial improvements in inference latency and deployment feasibility, particularly for resource-constrained multilingual applications.

Technology Category

Application Category

📝 Abstract

jina-reranker-v3 is a 0.6B parameter multilingual document reranker that introduces a novel last but not late interaction. Unlike late interaction models such as ColBERT that perform separate encoding followed by multi-vector matching, our approach conducts causal self-attention between query and documents within the same context window, enabling rich cross-document interactions before extracting contextual embeddings from the last token of each document. This compact architecture achieves state-of-the-art BEIR performance with 61.94 nDCG@10 while being ten times smaller than generative listwise rerankers.

Problem

Research questions and friction points this paper is trying to address.

Develops compact multilingual reranker with novel interaction mechanism

Enables cross-document interactions before embedding extraction

Achieves state-of-the-art performance with smaller model size

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal self-attention between query and documents

Contextual embeddings from last token extraction

Compact multilingual reranker with state-of-the-art performance

🔎 Similar Papers

No similar papers found.

Authors to Follow