GazeFusion: Saliency-guided Image Generation

📅 2024-03-16
🏛️ ACM Transactions on Applied Perception
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion models support text-guided and spatial-layout control but lack explicit modeling or regulation of human visual attention distributions, limiting their applicability in human-factor-sensitive scenarios. To address this, we propose a saliency-guided image generation framework that, for the first time, explicitly incorporates human visual attention priors into the diffusion process—enabling fine-grained, editable control over gaze regions (e.g., enhancement, suppression, and multi-scenario adaptation). Our method builds a saliency-conditioned control module atop the ControlNet architecture, grounded in empirical eye-tracking validation and large-scale model-driven saliency analysis. We further introduce a joint text–saliency conditional sampling strategy. Experiments demonstrate that generated images exhibit gaze distributions highly aligned with user-specified saliency maps, achieving a 42% reduction in KL divergence relative to baselines and significantly improving attention controllability.

Technology Category

Application Category

📝 Abstract
Diffusion models offer unprecedented image generation power given just a text prompt. While emerging approaches for controlling diffusion models have enabled users to specify the desired spatial layouts of the generated content, they cannot predict or control where viewers will pay more attention due to the complexity of human vision. Recognizing the significance of attention-controllable image generation in practical applications, we present a saliency-guided framework to incorporate the data priors of human visual attention mechanisms into the generation process. Given a user-specified viewer attention distribution, our control module conditions a diffusion model to generate images that attract viewers’ attention toward the desired regions. To assess the efficacy of our approach, we performed an eye-tracked user study and a large-scale model-based saliency analysis. The results evidence that both the cross-user eye gaze distributions and the saliency models’ predictions align with the desired attention distributions. Lastly, we outline several applications, including interactive design of saliency guidance, attention suppression in unwanted regions, and adaptive generation for varied display/viewing conditions.
Problem

Research questions and friction points this paper is trying to address.

Control viewer attention in image generation
Integrate human visual attention mechanisms
Generate images with specific attention areas
Innovation

Methods, ideas, or system contributions that make the work stand out.

Saliency-guided diffusion model
Attention-controllable image generation
Eye-tracked user study
🔎 Similar Papers
No similar papers found.
Y
Yunxiang Zhang
New York University, USA
N
Nan Wu
Stanford University, USA
C
Connor Z. Lin
Stanford University, USA
Gordon Wetzstein
Gordon Wetzstein
Associate Professor of Electrical Engineering and Computer Science, Stanford University
Computational ImagingComputational DisplaysComputational OpticsNeural Rendering
Q
Qi Sun
New York University, USA