🤖 AI Summary
This work addresses the challenge that, in text-to-image generation using diffusion models, the stochastic seed significantly influences output quality, yet human preference ratings remain difficult to predict, leading to substantial wasted computation. To mitigate this inefficiency, the authors propose a lightweight prediction mechanism capable of accurately forecasting human preference scores prior to image generation with minimal hardware overhead. By integrating diffusion models with a human preference metric (HPM), this approach demonstrates—for the first time—the feasibility of predicting perceptual quality before synthesis. The method enables effective pre-generation filtering of high-quality samples, substantially improving output quality in locally deployed settings while introducing negligible additional computational cost.
📝 Abstract
Diffusion Models (DM) have revolutionized text-driven generation by enabling the synthesis of high-quality, photorealistic visual content from user prompts. Whereas prior advances in visual generation such as VAEs and GANs were primarily evaluated on perceptual or visual similarity metrics such as FID PSNR, DM advances have fostered the development of more advanced Human Preference Metrics (HPM) that model and quantify human judgment as scalar values. However, DMs synthesize content using an inherently stochastic process where random noise seeds generation. The initial random noise directly affects the quality of generated outputs, both qualitatively and quantitatively. This influence is pronounced in smaller models for local deployment scenarios. Given this phenomenon, we first investigate to what extent we can predict scalar HPM scores prior to committing compute resources for generation. Further, we then investigate to what extent we can leverage such prediction to improve the quality of generated images, and also study which HPMs are best suited for this task. Our investigation reveals that not only is this possible, but that it is feasible to achieve negligible hardware overhead.