What Gets Unmasked First? Trajectory Analysis of Diffusion Models for Graph-to-Text Generation

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

This work investigates the hierarchical decoding dynamics of Masked Diffusion Language Models (MDLMs) in graph-to-text generation, revealing a consistent pattern: entities are generated first, followed by relations and functional words, with structural words finalized last. The study identifies that supervised fine-tuning prematurely fixes structural words at sentence endings, often leading to information omission or hallucination. To address this, the authors propose λ-scaling—a training-agnostic structural decoding strategy—and introduce Graph-LLaDA, a novel architecture integrating a Graph Transformer encoder with the LLaDA decoding framework. Experimental results demonstrate that λ-scaling improves BLEU-4 by 9.4 points, and Graph-LLaDA substantially outperforms overfitted baselines in cross-dataset evaluations on LAGRANGE, exhibiting superior generalization capability.

📝 Abstract

We present the first systematic study of masked diffusion language models (MDLMs) for graph-to-text generation. We analyze MDLM generation trajectories -- the order in which tokens are unmasked during iterative decoding -- and find that, unlike autoregressive LLMs which generate text linearly, MDLMs naturally prioritize entities first, followed by relational and function words, with structural tokens resolved last. We further identify a previously undocumented failure mode of supervised fine-tuning: SFT disrupts this strategy by prematurely anchoring structural sentence-ending tokens early in the decoding trajectory, effectively fixing the output length which can lead to omitted or hallucinated information. To address this, we propose lambda-scaled structural decoding, a training-free inference-time modification that downweights structural token confidence and recovers +9.4 BLEU-4. Finally, we introduce Graph-LLaDA, which integrates a Graph Transformer encoder into LLaDA's decoding process to explicitly incorporate relational graph structure. Cross-dataset evaluation on LAGRANGE reveals that previous baselines overfit to dataset-specific patterns, while LLM- and MDLM-based approaches generalize significantly better.

Problem

Research questions and friction points this paper is trying to address.

masked diffusion language models

graph-to-text generation

decoding trajectory

supervised fine-tuning

structural tokens

Innovation

Methods, ideas, or system contributions that make the work stand out.

masked diffusion language models

graph-to-text generation

trajectory analysis