jina-reranker-v3: Last but Not Late Interaction for Document Reranking

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the trade-off between efficiency and effectiveness in multilingual document re-ranking, this paper proposes a compact causal self-attention re-ranker. The architecture models query–document interaction in a unified context window using a “last-but-not-late” strategy—preserving full semantic context without early truncation while avoiding the redundant autoregressive decoding overhead of generative models. It further incorporates multilingual pretraining and a context-sensitive end-token embedding extraction mechanism to strengthen cross-lingual semantic alignment. Evaluated on the BEIR multilingual benchmark, the model achieves a state-of-the-art nDCG@10 of 61.94 with only one-tenth the parameters of leading generative listwise re-rankers. This yields substantial improvements in inference latency and deployment feasibility, particularly for resource-constrained multilingual applications.

Technology Category

Application Category

📝 Abstract
jina-reranker-v3 is a 0.6B parameter multilingual document reranker that introduces a novel last but not late interaction. Unlike late interaction models such as ColBERT that perform separate encoding followed by multi-vector matching, our approach conducts causal self-attention between query and documents within the same context window, enabling rich cross-document interactions before extracting contextual embeddings from the last token of each document. This compact architecture achieves state-of-the-art BEIR performance with 61.94 nDCG@10 while being ten times smaller than generative listwise rerankers.
Problem

Research questions and friction points this paper is trying to address.

Develops compact multilingual reranker with novel interaction mechanism
Enables cross-document interactions before embedding extraction
Achieves state-of-the-art performance with smaller model size
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal self-attention between query and documents
Contextual embeddings from last token extraction
Compact multilingual reranker with state-of-the-art performance
🔎 Similar Papers
No similar papers found.
F
Feng Wang
Jina AI GmbH
Yuqing Li
Yuqing Li
East China Normal University
Deep Learning Theory
H
Han Xiao
Jina AI GmbH