FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion

πŸ“… 2026-06-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge in autoregressive video generation where growing sequence lengths strain historical key-value (KV) caches, making it difficult to simultaneously preserve long-term consistency and short-term detail. To tackle this, the authors propose FadeMem, a memory management mechanism that operates under a fixed cache budget. FadeMem employs a power-law decay-based temporal allocation strategy to hierarchically merge historical KV blocks in a distance-aware manner: recent blocks are densely retained to preserve fine-grained details, while distant blocks are sparsely maintained to retain structural and identity information. Without modifying the underlying model architecture, FadeMem dynamically integrates high-resolution recent memory with coarse-grained long-range anchors. Experiments demonstrate that FadeMem significantly outperforms existing bounded-cache methods in subject consistency, background stability, and temporal coherence.
πŸ“ Abstract
Autoregressive video generators synthesize long videos by generating successive temporal segments, but their historical KV cache grows with video length. Existing bounded-cache methods reduce this cost with local windows, sink tokens, or compressed memory states, yet they usually assign fixed roles to different parts of the history. We propose FadeMem, a distance-aware KV memory consolidation mechanism that organizes historical KV blocks into a temporal hierarchy under a fixed cache budget. This design is motivated by frequency-dependent temporal decay: fine details decorrelate quickly, while coarse scene structure and identity remain useful over longer horizons. During generation, new history is inserted as fine-grained entries, while older adjacent entries are progressively merged under a power-law temporal allocation schedule, yielding a dense-near, sparse-far memory within one cache. Without architectural changes, FadeMem preserves recent context for short-term dynamics and compact long-range anchors for identity and scene coherence. Experiments show improved subject consistency, background stability, and temporal coherence over existing bounded-cache strategies.
Problem

Research questions and friction points this paper is trying to address.

autoregressive video generation
KV cache
memory consolidation
temporal coherence
bounded-cache
Innovation

Methods, ideas, or system contributions that make the work stand out.

FadeMem
distance-aware memory
autoregressive video diffusion
KV cache consolidation
temporal hierarchy
πŸ”Ž Similar Papers