🤖 AI Summary
Existing 3D-aware GANs generate high-fidelity images but lack a unified framework supporting reference-driven, geometrically consistent 3D editing. This paper introduces the first end-to-end triplane-based spatial editing framework, built upon triplane representations (e.g., from EG3D), enabling category-agnostic, reference-guided, 3D-consistent image editing. Our method integrates automatic key-region localization, spatially disentangled representation learning, latent-space alignment, and multi-scale feature fusion. It preserves fine-grained geometric consistency across diverse categories—including faces, animals, cartoons, and garments. Evaluated on multiple 3D editing benchmarks, our approach achieves state-of-the-art performance, substantially outperforming both text- and image-guided 2D/3D diffusion models and prior GAN-based methods. Comprehensive qualitative and quantitative experiments demonstrate superior geometric fidelity and visual quality.
📝 Abstract
Generative Adversarial Networks (GANs) have emerged as powerful tools for high-quality image generation and real image editing by manipulating their latent spaces. Recent advancements in GANs include 3D-aware models such as EG3D, which feature efficient triplane-based architectures capable of reconstructing 3D geometry from single images. However, limited attention has been given to providing an integrated framework for 3D-aware, high-quality, reference-based image editing. This study addresses this gap by exploring and demonstrating the effectiveness of the triplane space for advanced reference-based edits. Our novel approach integrates encoding, automatic localization, spatial disentanglement of triplane features, and fusion learning to achieve the desired edits. We demonstrate how our approach excels across diverse domains, including human faces, 360-degree heads, animal faces, partially stylized edits like cartoon faces, full-body clothing edits, and edits on class-agnostic samples. Our method shows state-of-the-art performance over relevant latent direction, text, and image-guided 2D and 3D-aware diffusion and GAN methods, both qualitatively and quantitatively.