Forward-Free Diffusion Language Models

📅 2026-06-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Traditional diffusion language models rely on handcrafted forward noising processes, which struggle to align draft generations with the target distribution in discrete language spaces, thereby limiting generation quality. This work proposes FReDA—a diffusion language model that eliminates the need for a predefined forward process by employing a recursive distribution refinement mechanism. FReDA iteratively refines model-generated drafts as intermediate states to progressively approximate the target distribution. The approach discards manual noise scheduling, enabling a neighborhood-agnostic, model-complexity-aware self-refinement framework that supports flexible refinement parameterizations. Experiments demonstrate that FReDA-4B significantly outperforms larger diffusion baselines among models under 8B parameters, achieving up to a 15% improvement on reasoning and code tasks and delivering 1.5–1.8× average speedup.

📝 Abstract

Diffusion language models generate text through iterative denoising, offering a powerful alternative to autoregressive generation. However, discrete language spaces lack a natural neighborhood structure for defining effective perturbations, so some artificial corruption schemes are proposed in the forward process. Such prescribed forward processes often produce states that are mathematically convenient but misaligned with drafts and errors encountered during generation, resulting in degraded sample quality. To address this limitation, we propose FReDA, a forward-free diffusion language model that eliminates the need for a hand-designed forward process. We formulate diffusion language modeling as recursive distribution refinement, in which model-generated drafts serve as implicit intermediate states, and the learned refinement model progressively moves the draft distribution toward the target distribution. Concretely, FReDA refines drafts by proposing candidate draft sequences and either directly performing self-refinement or selecting among parallel candidates via best-of-N refinement. With this design, FReDA is neighborhood-agnostic, model-complexity-aware, and compatible with flexible refinement parameterizations. Extensive evaluations in the sub-8B regime show that FReDA-4B outperforms larger diffusion base models on reasoning and coding benchmarks, achieving absolute gains of up to 15%, while reaching a 1.5-1.8x average speedup over diffusion baselines and scaling effectively with additional refinement computation.

Problem

Research questions and friction points this paper is trying to address.

diffusion language models

forward process

discrete language spaces

sample quality

text generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

forward-free diffusion

recursive distribution refinement

self-refinement