Plan, Verify and Fill: A Structured Parallel Decoding Approach for Diffusion Language Models

📅 2026-01-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing diffusion-based language models struggle to effectively leverage global bidirectional context during decoding, resulting in inefficient trajectory planning. This work proposes Plan-Verify-Fill (PVF), a novel hierarchical planning framework that, for the first time, integrates quantitative verification into the decoding process: it first constructs a semantic skeleton, then validates its structural plausibility, and finally fills in details in parallel. By combining semantic anchor prioritization, a structured verification protocol, and the bidirectional contextual capabilities of diffusion models, PVF enables training-free structured parallel decoding with a dynamic stopping strategy. Evaluated on LLaDA-8B-Instruct and Dream-7B-Instruct, PVF reduces function evaluations by up to 65% compared to confidence-based parallel decoding, substantially improving decoding efficiency while preserving generation quality.

Technology Category

Application Category

📝 Abstract

Diffusion Language Models (DLMs) present a promising non-sequential paradigm for text generation, distinct from standard autoregressive (AR) approaches. However, current decoding strategies often adopt a reactive stance, underutilizing the global bidirectional context to dictate global trajectories. To address this, we propose Plan-Verify-Fill (PVF), a training-free paradigm that grounds planning via quantitative validation. PVF actively constructs a hierarchical skeleton by prioritizing high-leverage semantic anchors and employs a verification protocol to operationalize pragmatic structural stopping where further deliberation yields diminishing returns. Extensive evaluations on LLaDA-8B-Instruct and Dream-7B-Instruct demonstrate that PVF reduces the Number of Function Evaluations (NFE) by up to 65% compared to confidence-based parallel decoding across benchmark datasets, unlocking superior efficiency without compromising accuracy.

Problem

Research questions and friction points this paper is trying to address.

Diffusion Language Models

Decoding Strategy

Global Context

Efficiency

Text Generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Language Models

Parallel Decoding

Plan-Verify-Fill

Structured Generation

Training-Free Inference

🔎 Similar Papers

No similar papers found.

Authors to Follow