GMSA: Enhancing Context Compression via Group Merging and Layer Semantic Alignment

📅 2025-05-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost and severe information redundancy in long-context reasoning with large language models (LLMs), this paper proposes GMSA, an encoder-decoder-based context compression framework. Its core contributions are threefold: (1) a novel Group-and-Merge mechanism that efficiently generates semantically condensed summary vectors; (2) Layer Semantic Alignment (LSA), which explicitly bridges semantic gaps across encoder layers; and (3) Knowledge Extraction Fine-Tuning (KEFT) coupled with stochastic compression-rate training to enhance generalization and convergence speed. Experiments demonstrate that GMSA significantly outperforms state-of-the-art compression methods in context reconstruction fidelity. In downstream question-answering tasks, it achieves end-to-end inference acceleration of approximately 2× compared to full-length input baselines, while surpassing both the original long-input performance and multiple SOTA approaches.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have achieved impressive performance in a variety of natural language processing (NLP) tasks. However, when applied to long-context scenarios, they face two challenges, i.e., low computational efficiency and much redundant information. This paper introduces GMSA, a context compression framework based on the encoder-decoder architecture, which addresses these challenges by reducing input sequence length and redundant information. Structurally, GMSA has two key components: Group Merging and Layer Semantic Alignment (LSA). Group merging is used to effectively and efficiently extract summary vectors from the original context. Layer semantic alignment, on the other hand, aligns the high-level summary vectors with the low-level primary input semantics, thus bridging the semantic gap between different layers. In the training process, GMSA first learns soft tokens that contain complete semantics through autoencoder training. To furtherly adapt GMSA to downstream tasks, we propose Knowledge Extraction Fine-tuning (KEFT) to extract knowledge from the soft tokens for downstream tasks. We train GMSA by randomly sampling the compression rate for each sample in the dataset. Under this condition, GMSA not only significantly outperforms the traditional compression paradigm in context restoration but also achieves stable and significantly faster convergence with only a few encoder layers. In downstream question-answering (QA) tasks, GMSA can achieve approximately a 2x speedup in end-to-end inference while outperforming both the original input prompts and various state-of-the-art (SOTA) methods by a large margin.
Problem

Research questions and friction points this paper is trying to address.

Improving computational efficiency in long-context LLMs
Reducing redundant information in input sequences
Bridging semantic gaps between different model layers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Group Merging extracts summary vectors efficiently
Layer Semantic Alignment bridges semantic gaps
Knowledge Extraction Fine-tuning adapts to downstream tasks
🔎 Similar Papers
No similar papers found.
Jiwei Tang
Jiwei Tang
Tsinghua University
Natural Language ProcessingLarge Language Model
Zhicheng Zhang
Zhicheng Zhang
Carnegie Mellon University
Reinforcement LearningExplainable RL
S
Shunlong Wu
Tsinghua University
J
Jingheng Ye
Tsinghua University
Lichen Bai
Lichen Bai
HKUST (GZ)
Generative AI
Zitai Wang
Zitai Wang
Institute of Computing Technology, Chinese Academy of Sciences
Machine learningData miningAUC optimization
T
Tingwei Lu
Tsinghua University
J
Jiaqi Chen
Tsinghua University
L
Lin Hai
Tsinghua University
H
Hai-Tao Zheng
Pengcheng Laboratory
H
Hong-Gee Kim
Seoul National University