WaveFilter: Enhancing the Long-Context Capability of Diffusion LLMs via Wavelet-Guided KV Cache Filtering

📅 2026-05-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

192K/year
🤖 AI Summary
This work addresses the high computational overhead and latency of diffusion-based large language models in long-context tasks, stemming from multi-step iterative reasoning, as well as the inability of existing KV caching methods to accurately preserve critical tokens, which degrades generation quality. To overcome these limitations, the paper introduces WaveFilter—a training-free, plug-and-play general-purpose caching framework that, for the first time, integrates wavelet transforms into the KV cache mechanism. By decomposing key-value representations in the frequency domain, WaveFilter precisely identifies and retains essential information to construct a sparse KV cache, mimicking human-like extraction of salient content during reading. Evaluated across multiple mainstream KV caching strategies, WaveFilter consistently enhances both long-context modeling efficiency and text generation quality while significantly reducing computational cost and inference latency.
📝 Abstract
Diffusion Large Language Models (DLMs) have demonstrated significant advantages across various tasks. However, constrained by their multi-step iterative inference mechanism, their computational overhead and inference latency in long-context tasks have become core bottlenecks restricting their large-scale deployment. When processing long sequences, existing Key-Value (KV) caching mechanisms often face a dilemma where generation quality degrades drastically, where the core challenge lies in precisely and efficiently filtering critical tokens within ultra-long contexts. Inspired by the human reading process, we propose \textbf{WaveFilter}, a universal and training-free caching framework. This framework innovatively introduces the wavelet transform for decomposition of long sequences to achieve precise identification of key tokens, based on which a sparse KV Cache is constructed to compute the final contextual representation. Experimental results demonstrate that WaveFilter, as a plug-and-play generic framework, significantly enhances the performance of existing mainstream KV Cache methods in complex long-context tasks.
Problem

Research questions and friction points this paper is trying to address.

Diffusion LLMs
long-context
KV cache
token filtering
inference latency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Wavelet Transform
KV Cache Filtering
Long-Context Modeling
Diffusion LLMs
Sparse Attention