ARC-Encoder: learning compressed text representations for large language models

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This work addresses the high computational cost in large language model (LLM) inference caused by contextual redundancy. We propose ARC-Encoder, a general-purpose, architecture- and parameter-agnostic context compression method that requires no modification to the target LLM. ARC-Encoder learns continuous, compact textual representations to replace original token embeddings, directly injecting them into the decoder’s input layer. Its core contribution is a unified encoder architecture—designed for broad compatibility across diverse LLM families—combined with continuous representation learning and a systematic training strategy, enabling 4×–8× context compression without fine-tuning the target model. Experiments demonstrate significant reductions in inference latency and GPU memory consumption across both instruction-tuned and base LLMs, with seamless plug-and-play deployment. ARC-Encoder achieves state-of-the-art performance in efficiency and practicality.

Technology Category

Application Category

📝 Abstract

Recent techniques such as retrieval-augmented generation or chain-of-thought reasoning have led to longer contexts and increased inference costs. Context compression techniques can reduce these costs, but the most effective approaches require fine-tuning the target model or even modifying its architecture. This can degrade its general abilities when not used for this specific purpose. Here we explore an alternative approach: an encoder that compresses the context into continuous representations which replace token embeddings in decoder LLMs. First, we perform a systematic study of training strategies and architecture choices for the encoder. Our findings led to the design of an Adaptable text Representations Compressor, named ARC-Encoder, which outputs $x$-times fewer continuous representations (typically $x!in!{4,8}$) than text tokens. We evaluate ARC-Encoder across a variety of LLM usage scenarios, ranging from in-context learning to context window extension, on both instruct and base decoders. Results show that ARC-Encoder achieves state-of-the-art performance on several benchmarks while improving computational efficiency at inference. Finally, we demonstrate that our models can be adapted to multiple decoders simultaneously, allowing a single encoder to generalize across different decoder LLMs. This makes ARC-Encoder a flexible and efficient solution for portable encoders that work seamlessly with multiple LLMs. We release a training code at https://github.com/kyutai-labs/ARC-Encoder , fine-tuning dataset and pretrained models are available at https://huggingface.co/collections/kyutai/arc-encoders-68ee18787301407d60a57047 .

Problem

Research questions and friction points this paper is trying to address.

Compresses context into continuous representations for LLMs

Reduces inference costs without model fine-tuning

Creates portable encoder compatible with multiple decoders

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compresses text into continuous representations for LLMs

Reduces token count by 4-8 times for efficiency

Single encoder adapts to multiple decoder models

🔎 Similar Papers

No similar papers found.