🤖 AI Summary
Existing 3D scene editing methods rely on per-scene optimization or cascaded pipelines, suffering from high computational costs, weak 3D awareness, and structural inconsistencies. This work proposes a feed-forward editing framework that enables efficient manipulation through asymmetric latent inpainting guided by a single edited image within a unified RGB-geometry generative latent space. A SceneAnchor Branch is introduced to inject structural priors from the source scene without enforcing exact replication. Additionally, an edit/background-aware loss function is designed to jointly preserve fidelity in edited regions and integrity of unedited content. The contributions include SceneEdit3D-15K, the first paired editing dataset with 3D annotations, and SceneEdit3D-Bench, a comprehensive benchmark. Experiments demonstrate superior performance over existing approaches in editing quality, 3D structural consistency, and background preservation.
📝 Abstract
Existing 3D scene editing methods typically rely on per-scene optimization over explicit 3D representations or cascaded edit-and-reconstruct pipelines, resulting in high test-time cost, limited 3D awareness, and structural inconsistencies. To couple appearance synthesis and geometry prediction during editing, we build on a unified RGB-geometry reconstruction-generation latent space and adapt it to feed-forward 3D scene editing. The resulting framework, \textbf{JointEdit3D}, performs asymmetric latent inpainting by observing only a single edited RGB reference latent and generating the remaining RGB views and edited geometry latent under source-scene anchoring. JointEdit3D introduces a dedicated SceneAnchor Branch to inject source-scene structure without forcing direct copying, and adopts edit/background-aware losses to balance edited-region fidelity with unedited-content preservation. To address the lack of paired resources for standardized 3D scene editing evaluation, we introduce SceneEdit3D-15K, a dataset with 15K paired editing samples and renderer-provided 3D annotations, together with SceneEdit3D-Bench, a curated 100-sample benchmark. Experiments show that JointEdit3D improves edited-region quality and 3D structural completeness over prior baselines while maintaining competitive background preservation.