HyperAlign: Hypernetwork for Efficient Test-Time Alignment of Diffusion Models

📅 2026-01-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the misalignment between diffusion model outputs and human preferences, which often manifests as semantic inaccuracies and reduced aesthetic quality. Existing alignment approaches struggle to balance generation diversity with computational efficiency. To overcome this, we propose HyperAlign, a framework that leverages a hypernetwork to dynamically generate low-rank adaptation (LoRA) weights at inference time, conditioned on the input latent variable, timestep, and text prompt to modulate the diffusion process for effective reward alignment. Notably, HyperAlign operates without modifying internal activations, substantially reducing computational overhead, and incorporates preference-data regularization to mitigate reward hacking. Evaluated on models such as Stable Diffusion and FLUX, HyperAlign consistently outperforms existing fine-tuning and test-time scaling methods, significantly enhancing both semantic fidelity and visual appeal while preserving output diversity.

Technology Category

Application Category

📝 Abstract
Diffusion models achieve state-of-the-art performance but often fail to generate outputs that align with human preferences and intentions, resulting in images with poor aesthetic quality and semantic inconsistencies. Existing alignment methods present a difficult trade-off: fine-tuning approaches suffer from loss of diversity with reward over-optimization, while test-time scaling methods introduce significant computational overhead and tend to under-optimize. To address these limitations, we propose HyperAlign, a novel framework that trains a hypernetwork for efficient and effective test-time alignment. Instead of modifying latent states, HyperAlign dynamically generates low-rank adaptation weights to modulate the diffusion model's generation operators. This allows the denoising trajectory to be adaptively adjusted based on input latents, timesteps and prompts for reward-conditioned alignment. We introduce multiple variants of HyperAlign that differ in how frequently the hypernetwork is applied, balancing between performance and efficiency. Furthermore, we optimize the hypernetwork using a reward score objective regularized with preference data to reduce reward hacking. We evaluate HyperAlign on multiple extended generative paradigms, including Stable Diffusion and FLUX. It significantly outperforms existing fine-tuning and test-time scaling baselines in enhancing semantic consistency and visual appeal.
Problem

Research questions and friction points this paper is trying to address.

diffusion models
test-time alignment
human preferences
semantic consistency
aesthetic quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hypernetwork
Test-Time Alignment
Low-Rank Adaptation
Diffusion Models
Reward Optimization
🔎 Similar Papers
No similar papers found.