SAID: Accelerating Diffusion-Based Language Models via Scaffold-Aware Iterative Decoding

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Diffusion-based large language models (DLLMs) suffer from low inference efficiency due to their multi-step denoising process. This work proposes a Scaffold-Aware Iterative Decoding (SAID) framework that constructs a semantic scaffold by identifying scaffold tokens, enabling prioritized allocation of computational resources. SAID integrates block-wise diffusion with a Confidence-Hierarchical Layered Generation (CHLG) strategy to dynamically adjust the number of denoising steps per token based on confidence levels. Introducing the first scaffold-aware mechanism in diffusion language modeling, the approach significantly optimizes computation distribution, achieving up to a 9.1× speedup on LLaDA-8B and LLaDA-1.5 while maintaining competitive performance on math, code, and knowledge-intensive tasks.

📝 Abstract

Diffusion large language models (DLLMs) enable non-autoregressive generation by iteratively denoising corrupted token sequences with bidirectional context. Despite their ability to update multiple positions in parallel, inference remains costly due to the many denoising steps required for high-quality generation. We propose SAID, a Scaffold-Aware Iterative Decoding framework that accelerates DLLMs by reallocating computation across tokens. SAID first spends denoising computation on scaffold tokens to establish the coarse semantic structure, and then completes predictable detail tokens with fewer steps. We further adapt SAID to block-wise diffusion decoding and introduce Confidence-Hierarchical Layered Generation (CHLG), which assigns additional steps only to low-confidence tokens. Experiments on LLaDA-8B and LLaDA 1.5 across math, coding, and knowledge benchmarks show that SAID significantly accelerates DLLM inference with a maximum speedup of 9.1x while maintaining competitive performance. Our code is publicly available: https://github.com/TH-AI-Lab-PKU/SAID.

Problem

Research questions and friction points this paper is trying to address.

Diffusion Language Models

Non-autoregressive Generation

Inference Acceleration

Denoising Steps

Computational Cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion language models

non-autoregressive generation

scaffold-aware decoding