Test-Time Alignment of Discrete Diffusion Models with Sequential Monte Carlo

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

Discrete diffusion models face a challenge during inference: they must satisfy external reward constraints without permitting model fine-tuning. Method: We propose a zero-shot, training-free sampling framework that— for the first time—couples Twisted Sequential Monte Carlo (Twisted SMC) with Gumbel-Softmax relaxation, and employs first-order Taylor approximation to efficiently estimate reward gradients. This enables differentiable, low-variance, high-fidelity importance reweighting in the discrete latent space. Contribution/Results: The framework requires no model modification or auxiliary training, yet significantly improves constraint satisfaction rates and generation quality across synthetic data and image modeling tasks. It establishes a new plug-and-play paradigm for controllable generation with discrete diffusion models.

Technology Category

Application Category

📝 Abstract

Discrete diffusion models have become highly effective across various domains. However, real-world applications often require the generative process to adhere to certain constraints but without task-specific fine-tuning. To this end, we propose a training-free method based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target distribution at the test time. Our approach leverages twisted SMC with an approximate locally optimal proposal, obtained via a first-order Taylor expansion of the reward function. To address the challenge of ill-defined gradients in discrete spaces, we incorporate a Gumbel-Softmax relaxation, enabling efficient gradient-based approximation within the discrete generative framework. Empirical results on both synthetic datasets and image modelling validate the effectiveness of our approach.

Problem

Research questions and friction points this paper is trying to address.

Aligning discrete diffusion models with constraints without fine-tuning

Sampling reward-aligned distributions using Sequential Monte Carlo

Handling ill-defined gradients in discrete spaces via Gumbel-Softmax

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sequential Monte Carlo for test-time alignment

Gumbel-Softmax relaxation for discrete gradients

First-order Taylor expansion for reward approximation

🔎 Similar Papers

Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis