Unified Energy for Invariant and Independent Decoding in Diffusion Language Models

πŸ“… 2026-06-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing diffusion language models significantly underperform autoregressive counterparts in highly parallel text generation due to difficulties in modeling inter-token dependencies. This work systematically identifies the performance gap as stemming from insufficient model capacity, inadequate dependency modeling, and a bias toward permutation invariance. To address these issues, the authors propose the Unified Energy (Uni-E) frameworkβ€”a sampling-free, exactly computable, and model-agnostic energy function that integrates invariant and independent energy terms to jointly correct distributional shifts. Theoretical analysis demonstrates that Uni-E effectively mitigates the aforementioned limitations, and extensive experiments across diverse diffusion language models and large language models confirm its ability to substantially narrow the performance gap with autoregressive baselines.
πŸ“ Abstract
Diffusion Language Models (DLMs) enable parallel text generation by iteratively denoising a full sequence, offering attractive flexibility compared to auto-regressive (AR) decoding. However, existing methods fail to fully capture token relationships, leading to a performance gap relative to AR baselines, especially as the degree of parallelism increases. In this paper, we give a systematic analysis of the gap, identifying three key factors: (i) model capacity, (ii) dependency, and (iii) invariance. To address these issues, we first propose an invariant energy (Inv-E) together with an effective sampling-based estimator to handle the invariance issue. By further combining with the independent energy (Ind-E), we obtain a unified energy (Uni-E), that accounts for all these factors. Uni-E enjoys a unique advantage: it can be computed exactly without sampling-based partition estimation. Besides, Uni-E is model agnostic and can therefore be scaled to models of arbitrary size. We further prove that Uni-E can correct the distribution shift caused by dependency and invariance. Extensive experiments across Diffusion Language Models (DLMs) and Diffusion Large Language Models (DLLMs) demonstrate the effectiveness of the proposed Uni-E.
Problem

Research questions and friction points this paper is trying to address.

Diffusion Language Models
parallel text generation
token relationships
performance gap
auto-regressive decoding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified Energy
Diffusion Language Models
Invariance
Dependency
Parallel Decoding