JointEdit3D: Feed-Forward 3D Scene Editing in a Unified Latent Space

📅 2026-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing 3D scene editing methods rely on per-scene optimization or cascaded pipelines, suffering from high computational costs, weak 3D awareness, and structural inconsistencies. This work proposes a feed-forward editing framework that enables efficient manipulation through asymmetric latent inpainting guided by a single edited image within a unified RGB-geometry generative latent space. A SceneAnchor Branch is introduced to inject structural priors from the source scene without enforcing exact replication. Additionally, an edit/background-aware loss function is designed to jointly preserve fidelity in edited regions and integrity of unedited content. The contributions include SceneEdit3D-15K, the first paired editing dataset with 3D annotations, and SceneEdit3D-Bench, a comprehensive benchmark. Experiments demonstrate superior performance over existing approaches in editing quality, 3D structural consistency, and background preservation.

📝 Abstract

Existing 3D scene editing methods typically rely on per-scene optimization over explicit 3D representations or cascaded edit-and-reconstruct pipelines, resulting in high test-time cost, limited 3D awareness, and structural inconsistencies. To couple appearance synthesis and geometry prediction during editing, we build on a unified RGB-geometry reconstruction-generation latent space and adapt it to feed-forward 3D scene editing. The resulting framework, \textbf{JointEdit3D}, performs asymmetric latent inpainting by observing only a single edited RGB reference latent and generating the remaining RGB views and edited geometry latent under source-scene anchoring. JointEdit3D introduces a dedicated SceneAnchor Branch to inject source-scene structure without forcing direct copying, and adopts edit/background-aware losses to balance edited-region fidelity with unedited-content preservation. To address the lack of paired resources for standardized 3D scene editing evaluation, we introduce SceneEdit3D-15K, a dataset with 15K paired editing samples and renderer-provided 3D annotations, together with SceneEdit3D-Bench, a curated 100-sample benchmark. Experiments show that JointEdit3D improves edited-region quality and 3D structural completeness over prior baselines while maintaining competitive background preservation.

Problem

Research questions and friction points this paper is trying to address.

3D scene editing

structural inconsistency

3D awareness

test-time cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

unified latent space

feed-forward 3D editing

asymmetric latent inpainting

SceneAnchor Branch