Generalized Interpolating Discrete Diffusion

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing discrete diffusion language models (e.g., masked diffusion) achieve reasonable performance but lack the ability to correct previously generated tokens, limiting output quality. To address this, we propose Generalized Interpolative Discrete Diffusion (GIDD), the first unified theoretical framework for interpolative discrete diffusion, along with a novel diffusion variational lower bound (ELBO). GIDD introduces a generalized interpolative noise schedule and a hybrid masking-uniform noise injection strategy, enabling controllable noise injection and self-correcting sampling—allowing dynamic error correction during sequence generation. Under identical computational budgets, GIDD achieves state-of-the-art performance in diffusion-based language modeling, significantly improving sample quality and textual coherence. The code and pretrained models are publicly released.

Technology Category

Application Category

📝 Abstract
While state-of-the-art language models achieve impressive results through next-token prediction, they have inherent limitations such as the inability to revise already generated tokens. This has prompted exploration of alternative approaches such as discrete diffusion. However, masked diffusion, which has emerged as a popular choice due to its simplicity and effectiveness, reintroduces this inability to revise words. To overcome this, we generalize masked diffusion and derive the theoretical backbone of a family of general interpolating discrete diffusion (GIDD) processes offering greater flexibility in the design of the noising processes. Leveraging a novel diffusion ELBO, we achieve compute-matched state-of-the-art performance in diffusion language modeling. Exploiting GIDD's flexibility, we explore a hybrid approach combining masking and uniform noise, leading to improved sample quality and unlocking the ability for the model to correct its own mistakes, an area where autoregressive models notoriously have struggled. Our code and models are open-source: https://github.com/dvruette/gidd/
Problem

Research questions and friction points this paper is trying to address.

Overcoming limitations of next-token prediction in language models
Generalizing masked diffusion for flexible noising processes
Enabling models to revise and correct generated tokens
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalized Interpolating Discrete Diffusion (GIDD)
Novel diffusion ELBO for improved performance
Hybrid approach combining masking and uniform noise
🔎 Similar Papers
No similar papers found.
D
Dimitri von Rutte
Data Analytics Lab, Department of Computer Science, ETH Zurich; ELLIS Institute Tübingen, Tübingen AI Center; Max Planck Institute for Intelligent Systems, Tübingen
J
J. Fluri
Data Analytics Lab, Department of Computer Science, ETH Zurich
Yuhui Ding
Yuhui Ding
ETH Zürich
Machine Learning
Antonio Orvieto
Antonio Orvieto
ELLIS Institute Tübingen, Max Planck Institute for Intelligent Systems
Deep LearningMachine LearningOptimizationDifferential EquationsNumerical Analysis
B
Bernhard Scholkopf
Data Analytics Lab, Department of Computer Science, ETH Zurich; ELLIS Institute Tübingen, Tübingen AI Center; Max Planck Institute for Intelligent Systems, Tübingen
T
Thomas Hofmann
Data Analytics Lab, Department of Computer Science, ETH Zurich