Self-Augmenting Retrieval for Diffusion Language Models

πŸ“… 2026-06-04
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

175K/year
πŸ€– AI Summary
This work addresses the underutilization of low-confidence tokens discarded during the denoising process of discrete diffusion language models, which often leads to delayed and insufficient evidence retrieval in retrieval-augmented generation (RAG). To overcome this limitation, the authors propose SARDIβ€”a dynamic RAG framework that repurposes these discarded tokens as proactive retrieval signals to guide external knowledge acquisition early in the generation process. SARDI requires no additional training and is compatible with any off-the-shelf retriever and inference-time discrete diffusion model. Experimental results demonstrate that SARDI consistently outperforms existing training-free diffusion-based and autoregressive RAG approaches across five challenging multi-hop question answering benchmarks, achieving up to an 8Γ— improvement in throughput.
πŸ“ Abstract
Discrete diffusion language models generate text by iteratively denoising an entire response in parallel. At each step, they predict tentative tokens for every masked position, committing the confident predictions to the output and discarding the unconfident ones. We show that the discarded tokens are in fact a useful lookahead signal for retrieval-augmented generation: even low-confidence tokens often surface salient entities early in the denoising trajectory, enabling retrieval of stronger evidence before the output is finalized. We exploit this through Self-Augmenting Retrieval for Diffusion Language Models (SARDI), a dynamic RAG framework that uses these lookahead tokens to guide retrieval during denoising. SARDI is training-free, retriever-agnostic, and applicable to any reasoning-capable discrete diffusion language model. Across five multi-hop QA benchmarks, SARDI outperforms current training-free diffusion and autoregressive retrieval baselines at up to $8\times$ higher throughput.
Problem

Research questions and friction points this paper is trying to address.

diffusion language models
retrieval-augmented generation
lookahead tokens
discrete diffusion
multi-hop QA
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion language models
retrieval-augmented generation
self-augmenting retrieval
lookahead tokens
training-free RAG
πŸ”Ž Similar Papers
No similar papers found.