Reranking with Compressed Document Representation

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

To address the high computational cost and inefficiency of large language models (LLMs) in document re-ranking—particularly for long documents—this paper proposes a lightweight reranker construction method based on document embedding compression and knowledge distillation. First, long documents are mapped to fixed-dimensional dense vectors, drastically reducing input sequence length. Second, a smaller, faster student reranker is trained via knowledge distillation using ranking supervision signals generated by a powerful teacher LLM. To our knowledge, this is the first work to explicitly incorporate compressed document representations into LLM-based re-ranking, thereby overcoming efficiency bottlenecks in long-document scenarios. Experiments demonstrate that, under billion-parameter model compression, the proposed method matches or surpasses baseline small models in ranking effectiveness across mainstream benchmarks, while achieving a 2.3× speedup in inference latency—successfully unifying effectiveness and efficiency.

Technology Category

Application Category

📝 Abstract

Reranking, the process of refining the output of a first-stage retriever, is often considered computationally expensive, especially with Large Language Models. Borrowing from recent advances in document compression for RAG, we reduce the input size by compressing documents into fixed-size embedding representations. We then teach a reranker to use compressed inputs by distillation. Although based on a billion-size model, our trained reranker using this compressed input can challenge smaller rerankers in terms of both effectiveness and efficiency, especially for long documents. Given that text compressors are still in their early development stages, we view this approach as promising.

Problem

Research questions and friction points this paper is trying to address.

Reduce reranking computational cost with compression

Teach reranker using compressed inputs via distillation

Improve efficiency for long document reranking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compress documents into fixed-size embeddings

Teach reranker via distillation technique

Balance effectiveness and efficiency

🔎 Similar Papers

No similar papers found.