XProvence: Zero-Cost Multilingual Context Pruning for Retrieval-Augmented Generation

📅 2026-01-26

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This work addresses the inefficiency of multilingual retrieval-augmented generation (RAG) systems caused by redundant context, a challenge exacerbated by the limited cross-lingual generalization of existing pruning methods. To this end, we propose XProvence, the first zero-cost multilingual context pruning approach that seamlessly integrates pruning capability directly into the reranker without incurring additional computational overhead. Building upon the Provence framework, XProvence leverages multilingual pretraining and cross-lingual transfer to support over 100 languages. Extensive experiments on four multilingual question answering benchmarks demonstrate that XProvence achieves substantial context compression with negligible performance degradation, significantly outperforming strong baselines.

Technology Category

Application Category

📝 Abstract

This paper introduces XProvence, a multilingual zero-cost context pruning model for retrieval-augmented generation (RAG), trained on 16 languages and supporting 100+ languages through effective cross-lingual transfer. Motivated by the growing use of RAG systems across diverse languages, we explore several strategies to generalize the Provence framework-which first integrated efficient zero-cost context pruning directly into the re-ranking model-beyond English. Across four multilingual question answering benchmarks, we show how XProvence can prune RAG contexts with minimal-to-no performance degradation and outperforms strong baselines. Our model is available at https://huggingface.co/naver/xprovence-reranker-bgem3-v2.

Problem

Research questions and friction points this paper is trying to address.

multilingual

context pruning

retrieval-augmented generation

zero-cost

RAG

Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-cost pruning

multilingual RAG

cross-lingual transfer