MemRoPE: Training-Free Infinite Video Generation via Evolving Memory Tokens

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the degradation of visual quality, identity drift, and motion stagnation in long-form video generation caused by the loss of historical context in sliding-window caching strategies. The authors propose a training-free autoregressive framework capable of generating videos of unlimited duration under a fixed memory budget. By compressing historical information into memory tokens via exponential moving average, the method preserves long-term consistency, while a decoupled online rotary positional encoding (RoPE) maintains short-term dynamics. This approach substantially enhances temporal coherence, visual fidelity, and subject consistency in videos spanning from minutes to hours, outperforming existing methods without requiring additional training.

Technology Category

Application Category

📝 Abstract
Autoregressive diffusion enables real-time frame streaming, yet existing sliding-window caches discard past context, causing fidelity degradation, identity drift, and motion stagnation over long horizons. Current approaches preserve a fixed set of early tokens as attention sinks, but this static anchor cannot reflect the evolving content of a growing video. We introduce MemRoPE, a training-free framework with two co-designed components. Memory Tokens continuously compress all past keys into dual long-term and short-term streams via exponential moving averages, maintaining both global identity and recent dynamics within a fixed-size cache. Online RoPE Indexing caches unrotated keys and applies positional embeddings dynamically at attention time, ensuring the aggregation is free of conflicting positional phases. These two mechanisms are mutually enabling: positional decoupling makes temporal aggregation well-defined, while aggregation makes fixed-size caching viable for unbounded generation. Extensive experiments validate that MemRoPE outperforms existing methods in temporal coherence, visual fidelity, and subject consistency across minute- to hour-scale generation.
Problem

Research questions and friction points this paper is trying to address.

infinite video generation
temporal coherence
identity drift
memory caching
autoregressive diffusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory Tokens
Online RoPE Indexing
Training-Free Video Generation
Temporal Coherence
Autoregressive Diffusion
🔎 Similar Papers
No similar papers found.
Youngrae Kim
Youngrae Kim
University of Southern California
Machine LearningComputer VisionDomain Adaptation
Qixin Hu
Qixin Hu
University of Southern California
C
C. -C. Jay Kuo
University of Southern California
P
Peter A. Beerel
University of Southern California