Beyond Hard and Soft: Hybrid Context Compression for Balancing Local and Global Information Retention

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address computational redundancy and information loss in long-sequence reasoning with large language models (LLMs), this paper proposes a global-local collaborative hybrid context compression method. The approach innovatively integrates global semantic modeling with local token importance estimation: it introduces a hybrid adapter architecture featuring a probabilistic token classification layer for fine-grained retention decisions, and enhances instruction alignment via auxiliary paraphrasing and completion pretraining. Evaluated on seven knowledge-intensive question-answering benchmarks, the method achieves an average performance gain of 13.1% over uncompressed baselines while reducing token consumption by 88.8%. Crucially, it matches or exceeds the accuracy of the full-context baseline, demonstrating substantial improvements in both inference efficiency and fidelity for long-context reasoning.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) encounter significant challenges in long-sequence inference due to computational inefficiency and redundant processing, driving interest in context compression techniques. Existing methods often rely on token importance to perform hard local compression or encode context into latent representations for soft global compression. However, the uneven distribution of textual content relevance and the diversity of demands for user instructions mean these approaches frequently lead to the loss of potentially valuable information. To address this, we propose $ extbf{Hy}$brid $ extbf{Co}$ntext $ extbf{Co}$mpression (HyCo$_2$) for LLMs, which integrates both global and local perspectives to guide context compression while retaining both the essential semantics and critical details for task completion. Specifically, we employ a hybrid adapter to refine global semantics with the global view, based on the observation that different adapters excel at different tasks. Then we incorporate a classification layer that assigns a retention probability to each context token based on the local view, determining whether it should be retained or discarded. To foster a balanced integration of global and local compression, we introduce auxiliary paraphrasing and completion pretraining before instruction tuning. This promotes a synergistic integration that emphasizes instruction-relevant information while preserving essential local details, ultimately balancing local and global information retention in context compression. Experiments show that our HyCo$_2$ method significantly enhances long-text reasoning while reducing token usage. It improves the performance of various LLM series by an average of 13.1% across seven knowledge-intensive QA benchmarks. Moreover, HyCo$_2$ matches the performance of uncompressed methods while reducing token consumption by 88.8%.
Problem

Research questions and friction points this paper is trying to address.

Balancing local and global information retention in context compression
Addressing computational inefficiency in long-sequence LLM inference
Reducing token usage while maintaining task performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid adapter refines global semantics
Classification layer assigns retention probability
Auxiliary pretraining balances global and local
🔎 Similar Papers
No similar papers found.
Huanxuan Liao
Huanxuan Liao
Institute of Automation, Chinese Academy of Sciences
Natural Language ProcessingLarge Language ModelLong Context Modeling
W
Wen Hu
Ant Group
Y
Yao Xu
Institute of Automation, Chinese Academy of Sciences; University of Chinese Academy of Sciences
S
Shizhu He
Institute of Automation, Chinese Academy of Sciences; University of Chinese Academy of Sciences
J
Jun Zhao
Institute of Automation, Chinese Academy of Sciences; University of Chinese Academy of Sciences
K
Kang Liu
Institute of Automation, Chinese Academy of Sciences; University of Chinese Academy of Sciences