Self-Augmenting Retrieval for Diffusion Language Models

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the underutilization of low-confidence tokens discarded during the denoising process of discrete diffusion language models, which often leads to delayed and insufficient evidence retrieval in retrieval-augmented generation (RAG). To overcome this limitation, the authors propose SARDI—a dynamic RAG framework that repurposes these discarded tokens as proactive retrieval signals to guide external knowledge acquisition early in the generation process. SARDI requires no additional training and is compatible with any off-the-shelf retriever and inference-time discrete diffusion model. Experimental results demonstrate that SARDI consistently outperforms existing training-free diffusion-based and autoregressive RAG approaches across five challenging multi-hop question answering benchmarks, achieving up to an 8× improvement in throughput.

📝 Abstract

Discrete diffusion language models generate text by iteratively denoising an entire response in parallel. At each step, they predict tentative tokens for every masked position, committing the confident predictions to the output and discarding the unconfident ones. We show that the discarded tokens are in fact a useful lookahead signal for retrieval-augmented generation: even low-confidence tokens often surface salient entities early in the denoising trajectory, enabling retrieval of stronger evidence before the output is finalized. We exploit this through Self-Augmenting Retrieval for Diffusion Language Models (SARDI), a dynamic RAG framework that uses these lookahead tokens to guide retrieval during denoising. SARDI is training-free, retriever-agnostic, and applicable to any reasoning-capable discrete diffusion language model. Across five multi-hop QA benchmarks, SARDI outperforms current training-free diffusion and autoregressive retrieval baselines at up to $8\times$ higher throughput.

Problem

Research questions and friction points this paper is trying to address.

diffusion language models

retrieval-augmented generation

lookahead tokens

discrete diffusion

multi-hop QA

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion language models

retrieval-augmented generation

self-augmenting retrieval