GazeFusion: Saliency-guided Image Generation

📅 2024-03-16

🏛️ ACM Transactions on Applied Perception

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing diffusion models support text-guided and spatial-layout control but lack explicit modeling or regulation of human visual attention distributions, limiting their applicability in human-factor-sensitive scenarios. To address this, we propose a saliency-guided image generation framework that, for the first time, explicitly incorporates human visual attention priors into the diffusion process—enabling fine-grained, editable control over gaze regions (e.g., enhancement, suppression, and multi-scenario adaptation). Our method builds a saliency-conditioned control module atop the ControlNet architecture, grounded in empirical eye-tracking validation and large-scale model-driven saliency analysis. We further introduce a joint text–saliency conditional sampling strategy. Experiments demonstrate that generated images exhibit gaze distributions highly aligned with user-specified saliency maps, achieving a 42% reduction in KL divergence relative to baselines and significantly improving attention controllability.

Technology Category

Application Category

📝 Abstract

Diffusion models offer unprecedented image generation power given just a text prompt. While emerging approaches for controlling diffusion models have enabled users to specify the desired spatial layouts of the generated content, they cannot predict or control where viewers will pay more attention due to the complexity of human vision. Recognizing the significance of attention-controllable image generation in practical applications, we present a saliency-guided framework to incorporate the data priors of human visual attention mechanisms into the generation process. Given a user-specified viewer attention distribution, our control module conditions a diffusion model to generate images that attract viewers’ attention toward the desired regions. To assess the efficacy of our approach, we performed an eye-tracked user study and a large-scale model-based saliency analysis. The results evidence that both the cross-user eye gaze distributions and the saliency models’ predictions align with the desired attention distributions. Lastly, we outline several applications, including interactive design of saliency guidance, attention suppression in unwanted regions, and adaptive generation for varied display/viewing conditions.

Problem

Research questions and friction points this paper is trying to address.

Control viewer attention in image generation

Integrate human visual attention mechanisms

Generate images with specific attention areas

Innovation

Methods, ideas, or system contributions that make the work stand out.

Saliency-guided diffusion model

Attention-controllable image generation

Eye-tracked user study

🔎 Similar Papers

No similar papers found.

Authors to Follow