CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work addresses the challenges of out-of-distribution generalization and semantic modeling faced by Rotary Position Embedding (RoPE) in large language models under long-context scenarios. The authors propose CoPE, an extremely simple soft truncation strategy that suppresses low-frequency components in RoPE without requiring additional training or architectural modifications. This approach simultaneously mitigates out-of-distribution positional anomalies, enhances semantic attention, and avoids spectral leakage caused by hard truncation. Evaluated on contexts up to 256k tokens, CoPE achieves significant performance improvements and establishes state-of-the-art length generalization capabilities.

Technology Category

Application Category

📝 Abstract

Rotary Positional Embedding (RoPE) is a key component of context scaling in Large Language Models (LLMs). While various methods have been proposed to adapt RoPE to longer contexts, their guiding principles generally fall into two categories: (1) out-of-distribution (OOD) mitigation, which scales RoPE frequencies to accommodate unseen positions, and (2) Semantic Modeling, which posits that the attention scores computed with RoPE should always prioritize semantically similar tokens. In this work, we unify these seemingly distinct objectives through a minimalist intervention, namely CoPE: soft clipping lowfrequency components of RoPE. CoPE not only eliminates OOD outliers and refines semantic signals, but also prevents spectral leakage caused by hard clipping. Extensive experiments demonstrate that simply applying our soft clipping strategy to RoPE yields significant performance gains that scale up to 256k context length, validating our theoretical analysis and establishing CoPE as a new state-of-the-art for length generalization. Our code, data, and models are available at https://github.com/hrlics/CoPE.

Problem

Research questions and friction points this paper is trying to address.

Rotary Positional Embedding

Long Context

Out-of-Distribution

Semantic Modeling

Length Generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

CoPE

Rotary Positional Embedding

soft clipping