GeoDiffuser: Geometry-Based Image Editing with Diffusion Models

📅 2024-04-22
🏛️ arXiv.org
📈 Citations: 5
Influential: 0
📄 PDF
🤖 AI Summary
Existing image editing methods suffer from significant limitations in geometric accuracy, 3D consistency, occlusion recovery, and photorealistic lighting. To address these challenges, we propose a zero-shot, geometry-driven image editing framework that models 2D/3D object manipulations—including translation, 3D rotation, and deletion—as differentiable geometric transformations. These transformations are directly injected into the spatial modulation modules of the attention layers in SDXL, requiring no training or fine-tuning. Our method integrates SAM-based segmentation with depth- and lighting-consistency constraints, enabling zero-shot gradient-based optimization that jointly enforces semantic fidelity, geometric plausibility, and visual realism. Quantitative evaluations and user studies demonstrate substantial improvements over state-of-the-art approaches. To our knowledge, this is the first method achieving training-free, geometry-explicit, 3D-aware high-fidelity image editing with natural occlusion recovery.

Technology Category

Application Category

📝 Abstract
The success of image generative models has enabled us to build methods that can edit images based on text or other user input. However, these methods are bespoke, imprecise, require additional information, or are limited to only 2D image edits. We present GeoDiffuser, a zero-shot optimization-based method that unifies common 2D and 3D image-based object editing capabilities into a single method. Our key insight is to view image editing operations as geometric transformations. We show that these transformations can be directly incorporated into the attention layers in diffusion models to implicitly perform editing operations. Our training-free optimization method uses an objective function that seeks to preserve object style but generate plausible images, for instance with accurate lighting and shadows. It also inpaints disoccluded parts of the image where the object was originally located. Given a natural image and user input, we segment the foreground object using SAM and estimate a corresponding transform which is used by our optimization approach for editing. GeoDiffuser can perform common 2D and 3D edits like object translation, 3D rotation, and removal. We present quantitative results, including a perceptual study, that shows how our approach is better than existing methods. Visit https://ivl.cs.brown.edu/research/geodiffuser.html for more information.
Problem

Research questions and friction points this paper is trying to address.

Image Editing
3D Image Processing
Realism
Innovation

Methods, ideas, or system contributions that make the work stand out.

GeoDiffuser
Shape Editing
Diffusion Model
🔎 Similar Papers
No similar papers found.