A2D2: Fine-Tuning Any-Length Discrete Diffusion for Adaptive Decoding

📅 2026-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of principled reward-guided fine-tuning methods for discrete diffusion models operating on sequences of arbitrary length by proposing a unified framework, A2D2. The framework derives, for the first time, the Radon–Nikodym derivative of the path measure associated with insert-and-denoise trajectories, providing theoretical guarantees for convergence to a reward-tilted target distribution. It further introduces an adaptively joint decoding (AJD) loss that is provably optimal. By jointly optimizing insertion and denoising policies alongside a quality-aware inference scheduling strategy, A2D2 simultaneously enhances reward optimization, generation flexibility, and sequence accuracy, outperforming existing fixed-length fine-tuning approaches and inference-time guidance methods.
📝 Abstract
Discrete diffusion models offer a simple and stable likelihood-based framework for sequence generation, recently extended to any-length settings via token insertion. Principled reward-guided fine-tuning for any-length discrete diffusion, however, remains largely unexplored. We introduce Fine-Tuning Any-Length Discrete Diffusion for Adaptive Decoding (A2D2), a unified framework for reward-guided fine-tuning of any-length discrete diffusion models via joint optimization of the insertion and unmasking policies together with a quality-based inference schedule. We derive the Radon-Nikodym derivative for the joint insertion-unmasking path measures, enabling theoretically guaranteed convergence to the intractable reward-tilted sequence distribution without requiring target samples. Building on this, we establish unmasking and insertion quality as tractable approaches for minimizing decoding error and introduce the Adaptive Joint Decoding (AJD) loss, which provably yields the optimal path measure that generates the reward-tilted distribution. Empirically, A2D2 improves reward optimization while enhancing generation flexibility and accuracy over prior fixed-length fine-tuning and inference-time guidance methods.
Problem

Research questions and friction points this paper is trying to address.

discrete diffusion
any-length generation
reward-guided fine-tuning
adaptive decoding
Innovation

Methods, ideas, or system contributions that make the work stand out.

discrete diffusion
reward-guided fine-tuning
any-length generation
adaptive decoding
Radon-Nikodym derivative
🔎 Similar Papers