Plan, Verify and Fill: A Structured Parallel Decoding Approach for Diffusion Language Models

📅 2026-01-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion-based language models struggle to effectively leverage global bidirectional context during decoding, resulting in inefficient trajectory planning. This work proposes Plan-Verify-Fill (PVF), a novel hierarchical planning framework that, for the first time, integrates quantitative verification into the decoding process: it first constructs a semantic skeleton, then validates its structural plausibility, and finally fills in details in parallel. By combining semantic anchor prioritization, a structured verification protocol, and the bidirectional contextual capabilities of diffusion models, PVF enables training-free structured parallel decoding with a dynamic stopping strategy. Evaluated on LLaDA-8B-Instruct and Dream-7B-Instruct, PVF reduces function evaluations by up to 65% compared to confidence-based parallel decoding, substantially improving decoding efficiency while preserving generation quality.

Technology Category

Application Category

📝 Abstract
Diffusion Language Models (DLMs) present a promising non-sequential paradigm for text generation, distinct from standard autoregressive (AR) approaches. However, current decoding strategies often adopt a reactive stance, underutilizing the global bidirectional context to dictate global trajectories. To address this, we propose Plan-Verify-Fill (PVF), a training-free paradigm that grounds planning via quantitative validation. PVF actively constructs a hierarchical skeleton by prioritizing high-leverage semantic anchors and employs a verification protocol to operationalize pragmatic structural stopping where further deliberation yields diminishing returns. Extensive evaluations on LLaDA-8B-Instruct and Dream-7B-Instruct demonstrate that PVF reduces the Number of Function Evaluations (NFE) by up to 65% compared to confidence-based parallel decoding across benchmark datasets, unlocking superior efficiency without compromising accuracy.
Problem

Research questions and friction points this paper is trying to address.

Diffusion Language Models
Decoding Strategy
Global Context
Efficiency
Text Generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Language Models
Parallel Decoding
Plan-Verify-Fill
Structured Generation
Training-Free Inference
🔎 Similar Papers
No similar papers found.
Miao Li
Miao Li
Georgia institute of technology
general machine learninguncertainty quantificationgenerative modelsrloptimization proxies
H
Hanyang Jiang
Department of Georgia Institute of Technology, Georgia Institute of Technology, Atlanta, USA
Sikai Chen
Sikai Chen
Assistant Professor, University of Wisconsin-Madison
Human-centered AIAutonomous VehicleReinforcement LearningLLM/VLMRoadway Safety
H
Hengyu Fu
University of California, Berkeley, USA
Y
Yuhang Cai
University of California, Berkeley, USA
Baihe Huang
Baihe Huang
University of California, Berkeley
machine learning
T
Tinghan Ye
Department of Georgia Institute of Technology, Georgia Institute of Technology, Atlanta, USA
X
Xuanzhou Chen
Department of Georgia Institute of Technology, Georgia Institute of Technology, Atlanta, USA
P
P. V. Hentenryck
Department of Georgia Institute of Technology, Georgia Institute of Technology, Atlanta, USA