Encoder-Decoder Diffusion Language Models for Efficient Training and Inference

📅 2025-10-26

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Existing discrete diffusion language models predominantly adopt full-decoder architectures, where each denoising step requires executing the entire network, resulting in high computational overhead and inefficient inference. Method: We propose the first encoder-decoder-based discrete diffusion model: a dedicated encoder learns clean-text representations, while a lightweight decoder performs iterative denoising; combined with block-wise sequence partitioning and specialized training/sampling algorithms, this design decouples representation learning from noise removal. Contribution/Results: Our architecture significantly improves training stability and inference throughput. Empirical evaluation on summarization, machine translation, and mathematical reasoning demonstrates superior quality–latency trade-offs at reduced computational cost. This work establishes a new paradigm for efficient discrete diffusion modeling.

Technology Category

Application Category

📝 Abstract

Discrete diffusion models enable parallel token sampling for faster inference than autoregressive approaches. However, prior diffusion models use a decoder-only architecture, which requires sampling algorithms that invoke the full network at every denoising step and incur high computational cost. Our key insight is that discrete diffusion models perform two types of computation: 1) representing clean tokens and 2) denoising corrupted tokens, which enables us to use separate modules for each task. We propose an encoder-decoder architecture to accelerate discrete diffusion inference, which relies on an encoder to represent clean tokens and a lightweight decoder to iteratively refine a noised sequence. We also show that this architecture enables faster training of block diffusion models, which partition sequences into blocks for better quality and are commonly used in diffusion language model inference. We introduce a framework for Efficient Encoder-Decoder Diffusion (E2D2), consisting of an architecture with specialized training and sampling algorithms, and we show that E2D2 achieves superior trade-offs between generation quality and inference throughput on summarization, translation, and mathematical reasoning tasks. We provide the code, model weights, and blog post on the project page: https://m-arriola.com/e2d2

Problem

Research questions and friction points this paper is trying to address.

Accelerating discrete diffusion inference via encoder-decoder architecture

Reducing computational cost in diffusion language model training

Improving quality-throughput tradeoff in text generation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Encoder-decoder architecture separates token representation and denoising

Lightweight decoder enables iterative refinement of noised sequences

Specialized training algorithms accelerate block diffusion model inference

🔎 Similar Papers

Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion