SCOPE: A Generative Approach for LLM Prompt Compression

📅 2025-08-15

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Existing LLM prompt compression methods predominantly rely on token deletion, often resulting in grammatical incompleteness, semantic fragmentation, and critical information loss—thereby degrading generation quality. To address this, we propose a generative prompt compression framework that replaces naïve pruning with semantic-aware chunking, generative abstractive rewriting, keyword-constrained retention, anomalous chunk detection, and dynamic compression-ratio control. These mechanisms jointly enable active semantic reconstruction and refinement during compression, preserving logical coherence and factual integrity while substantially reducing input length. Extensive experiments across diverse question-answering and summarization benchmarks demonstrate consistent superiority over state-of-the-art baselines. Notably, the method maintains high generation quality and robustness even under high compression ratios (>50%), validating the effectiveness and practicality of the generative paradigm for prompt compression.

Technology Category

Application Category

📝 Abstract

Prompt compression methods enhance the efficiency of Large Language Models (LLMs) and minimize the cost by reducing the length of input context. The goal of prompt compression is to shorten the LLM prompt while maintaining a high generation quality. However, existing solutions, mainly based on token removal, face challenges such as information loss and structural incoherence, like missing grammar elements in a sentence, or incomplete word phrases after token removal. Such challenges limit the final generation quality of LLM. To overcome these limitations, we present a novel generative prompt compression method. Unlike the existing token removal methods, our method centers at a chunking-and-summarization mechanism. Specifically, our method splits prompt into semantically coherent chunks and rewrites the chunks to be more concise. The chunks are reconstructed into meaningful prompt finally. We design several optimization techniques for the mechanism, including optimized semantic chunking, outlier chunk handling, dynamic compression ratio, compression prioritization, and keyword maintaining. These techniques effectively improve the identifying and preserving of critical information and coherence among texts, as well as providing finer grind control of the compression ratio. We conduct extensive evaluation on question-answering and summarization tasks, with datasets covering multiple different domain. The evaluation shows our method achieves a significantly better compression quality, and higher stability than the state-of-the-art methods, especially under high compression ratio, which proves the effectiveness and practicality of our method.

Problem

Research questions and friction points this paper is trying to address.

Compress LLM prompts while maintaining generation quality

Overcome information loss from token removal methods

Ensure semantic coherence and structural integrity in compression

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative chunking and summarization mechanism

Optimized semantic chunking with dynamic compression

Keyword maintaining and compression prioritization techniques

🔎 Similar Papers

From Reading to Compressing: Exploring the Multi-document Reader for Prompt Compression