PSC: Extending Context Window of Large Language Models via Phase Shift Calibration

📅 2025-05-18

🏛️ Conference on Empirical Methods in Natural Language Processing

📈 Citations: 1

✨ Influential: 0

career value

164K/year

🤖 AI Summary

To address the challenge of suboptimal preset frequency scaling factors in RoPE-based context window extension—where the search space grows exponentially—this paper proposes a lightweight Phase Shift Calibration (PSC) module. PSC introduces, for the first time, a learnable phase shift mechanism that dynamically calibrates preconfigured frequency scaling (e.g., in PI, YaRN, and LongRoPE) without altering the original RoPE architecture or requiring model retraining. Its core components include differentiable phase calibration, a lightweight linear projection, and context-length-adaptive initialization. Experiments demonstrate that PSC consistently reduces perplexity on long-context benchmarks (16K–64K tokens), exhibits robust cross-model (Llama, Qwen) and cross-task (QA, long-document reasoning) performance, and delivers increasingly substantial gains as context length grows. Overall, PSC significantly enhances the robustness and plug-and-play applicability of existing RoPE extension methods.

Technology Category

Application Category

📝 Abstract

Rotary Position Embedding (RoPE) is an efficient position encoding approach and is widely utilized in numerous large language models (LLMs). Recently, a lot of methods have been put forward to further expand the context window based on RoPE. The core concept of those methods is to predefine or search for a set of factors to rescale the base frequencies of RoPE. Nevertheless, it is quite a challenge for existing methods to predefine an optimal factor due to the exponential search space. In view of this, we introduce PSC (Phase Shift Calibration), a small module for calibrating the frequencies predefined by existing methods. With the employment of PSC, we demonstrate that many existing methods can be further enhanced, like PI, YaRN, and LongRoPE. We conducted extensive experiments across multiple models and tasks. The results demonstrate that (1) when PSC is enabled, the comparative reductions in perplexity increase as the context window size is varied from 16k, to 32k, and up to 64k. (2) Our approach is broadly applicable and exhibits robustness across a variety of models and tasks.

Problem

Research questions and friction points this paper is trying to address.

Extending context window of LLMs via Phase Shift Calibration

Challenges in optimizing RoPE frequency scaling factors

Enhancing existing methods like PI, YaRN, and LongRoPE

Innovation

Methods, ideas, or system contributions that make the work stand out.

Phase Shift Calibration enhances RoPE-based methods

PSC module optimizes predefined frequency scaling

Improves performance across various models and tasks

🔎 Similar Papers

No similar papers found.