Diamond Maps: Efficient Reward Alignment via Stochastic Flow Maps

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that existing generative models struggle to efficiently and robustly align with arbitrary user preferences or constraints—referred to as reward alignment—after training. The authors propose Diamond Maps, a stochastic flow mapping framework that intrinsically embeds reward alignment capability into the model architecture. By leveraging learnable single-step stochastic flows, Diamond Maps enables both efficient sampling and precise alignment. The method integrates GLASS Flows distillation, value function estimation, and sequential Monte Carlo strategies, allowing rapid adaptation to any reward function at inference time without retraining. Experimental results demonstrate that Diamond Maps significantly outperforms current approaches across multiple tasks, achieving notable advances in alignment accuracy, training efficiency, and inference-time flexibility.

Technology Category

Application Category

📝 Abstract
Flow and diffusion models produce high-quality samples, but adapting them to user preferences or constraints post-training remains costly and brittle, a challenge commonly called reward alignment. We argue that efficient reward alignment should be a property of the generative model itself, not an afterthought, and redesign the model for adaptability. We propose"Diamond Maps", stochastic flow map models that enable efficient and accurate alignment to arbitrary rewards at inference time. Diamond Maps amortize many simulation steps into a single-step sampler, like flow maps, while preserving the stochasticity required for optimal reward alignment. This design makes search, sequential Monte Carlo, and guidance scalable by enabling efficient and consistent estimation of the value function. Our experiments show that Diamond Maps can be learned efficiently via distillation from GLASS Flows, achieve stronger reward alignment performance, and scale better than existing methods. Our results point toward a practical route to generative models that can be rapidly adapted to arbitrary preferences and constraints at inference time.
Problem

Research questions and friction points this paper is trying to address.

reward alignment
generative models
inference-time adaptation
flow models
diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diamond Maps
reward alignment
stochastic flow maps
value function estimation
generative model adaptability
🔎 Similar Papers
No similar papers found.
P
Peter Holderrieth
MIT CSAIL
D
Douglas Chen
Carnegie Mellon University
L
L. Eyring
TU Munich, Helmholtz Munich, MCML
I
Ishin Shah
Carnegie Mellon University
G
Giri Anantharaman
Carnegie Mellon University
Yutong He
Yutong He
Carnegie Mellon University
machine learning
Zeynep Akata
Zeynep Akata
Professor at Technical University of Munich and Director at Helmholtz Munich
Machine LearningVision and LanguageZero-Shot Learning
T
T. Jaakkola
MIT CSAIL
N
N. Boffi
Carnegie Mellon University
Max Simchowitz
Max Simchowitz
MIT
Machine Learning TheoryRoboticsControl