EPIC: Efficient and Parallel Inference under CFG Constraints for Diffusion Language Models

πŸ“… 2026-05-30
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

191K/year
πŸ€– AI Summary
This work addresses the severe degradation in parallel inference efficiency of diffusion language models caused by existing classifier-free guidance (CFG) constrained decoding methods, which can incur up to 4Γ— latency overhead. To restore the models’ inherent parallelism under CFG constraints, the authors propose EPIC, a novel framework that replaces conventional automata with Earley parsing, accelerates syntactic analysis via lexical memoization, and introduces a relaxed compatible subset selection mechanism to enable multi-token parallel commitment. Experimental results across three benchmarks and four models demonstrate that EPIC reduces inference time by up to 67.5% compared to current approaches, while cutting additional computational overhead by as much as 90.5%.
πŸ“ Abstract
Controlling language model outputs is essential for ensuring structural validity, reliability, and downstream usability, and diffusion language models are no exception. Recent advances in diffusion language model decoding have extended output control beyond regular constraints to context-free grammar (CFG) constraints. Existing methods, however, can be up to four times slower than unconstrained decoding. More importantly, they substantially diminish one of the key advantages of diffusion language models over autoregressive models, namely parallel decoding. This slowdown arises because sequential validity checking introduces significant overhead during parallel generation. We propose an efficient CFG-constrained decoding framework, EPIC, that addresses this limitation. Our method improves decoding efficiency by combining lexing memoization, validation using Earley-style parsing instead of deterministic automata, and relaxed compatible subset selection for parallel commit. It reduces repeated lexing and validation overhead while allowing multiple compatible tokens to be committed together. Experiments on three benchmarks using four models show that our method reduces inference time by up to 67.5% and decreases the additional overhead by up to 90.5% compared with existing CFG-constrained decoding methods. Our implementation is available at https://github.com/hyundong98/EPIC-Decoding.git .
Problem

Research questions and friction points this paper is trying to address.

diffusion language models
context-free grammar
constrained decoding
parallel inference
structural validity
Innovation

Methods, ideas, or system contributions that make the work stand out.

CFG-constrained decoding
diffusion language models
parallel inference
Earley parsing
lexing memoization