🤖 AI Summary
To address the challenge of fine-grained user-intent alignment in Score Distillation Sampling (SDS)—particularly in text-to-3D generation—this work introduces, for the first time, a learnable reward model into the SDS framework. We propose a reward-weighted noise sampling mechanism and a corresponding weighted SDS loss. Extending this to a variational setting, we develop RewardVSD, enabling consistent optimization across multiple reward dimensions. Our method unifies pretrained diffusion priors, end-to-end differentiable reward modeling, and variational score distillation, achieving gradient-level intent alignment. Experiments demonstrate that RewardVSD consistently outperforms both SDS and Variational Score Distillation (VSD) across text-to-image generation, 2D image editing, and text-to-3D synthesis. It establishes new state-of-the-art performance in generation quality, semantic fidelity, and controllability.
📝 Abstract
Score Distillation Sampling (SDS) has emerged as an effective technique for leveraging 2D diffusion priors for tasks such as text-to-3D generation. While powerful, SDS struggles with achieving fine-grained alignment to user intent. To overcome this, we introduce RewardSDS, a novel approach that weights noise samples based on alignment scores from a reward model, producing a weighted SDS loss. This loss prioritizes gradients from noise samples that yield aligned high-reward output. Our approach is broadly applicable and can extend SDS-based methods. In particular, we demonstrate its applicability to Variational Score Distillation (VSD) by introducing RewardVSD. We evaluate RewardSDS and RewardVSD on text-to-image, 2D editing, and text-to-3D generation tasks, showing significant improvements over SDS and VSD on a diverse set of metrics measuring generation quality and alignment to desired reward models, enabling state-of-the-art performance. Project page is available at https://itaychachy.github.io/reward-sds/.