GaussianBlender: Instant Stylization of 3D Gaussians with Disentangled Latent Spaces

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current text-to-3D stylization methods rely on per-asset optimization, suffering from low efficiency and multi-view inconsistency—hindering scalability for industrial production. This paper proposes the first feed-forward, text-driven 3D Gaussian stylization framework, enabling second-level, high-fidelity editing without test-time optimization. Our core innovation is a novel decoupled latent space architecture: spatially grouped 3D Gaussians explicitly disentangle geometry and appearance control; coupled with a text-conditioned latent diffusion model, it achieves geometry-preserving, multi-view-consistent style transfer. Experiments demonstrate that our method surpasses optimization-based baselines across fidelity, multi-view consistency, and inference speed—significantly enhancing practicality for large-scale 3D content creation.

Technology Category

Application Category

📝 Abstract
3D stylization is central to game development, virtual reality, and digital arts, where the demand for diverse assets calls for scalable methods that support fast, high-fidelity manipulation. Existing text-to-3D stylization methods typically distill from 2D image editors, requiring time-intensive per-asset optimization and exhibiting multi-view inconsistency due to the limitations of current text-to-image models, which makes them impractical for large-scale production. In this paper, we introduce GaussianBlender, a pioneering feed-forward framework for text-driven 3D stylization that performs edits instantly at inference. Our method learns structured, disentangled latent spaces with controlled information sharing for geometry and appearance from spatially-grouped 3D Gaussians. A latent diffusion model then applies text-conditioned edits on these learned representations. Comprehensive evaluations show that GaussianBlender not only delivers instant, high-fidelity, geometry-preserving, multi-view consistent stylization, but also surpasses methods that require per-instance test-time optimization - unlocking practical, democratized 3D stylization at scale.
Problem

Research questions and friction points this paper is trying to address.

Enables instant text-driven 3D stylization without per-asset optimization
Achieves multi-view consistent edits by learning disentangled geometry and appearance latents
Provides scalable, high-fidelity 3D asset manipulation for practical production use
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feed-forward framework for instant 3D stylization
Disentangled latent spaces from grouped 3D Gaussians
Latent diffusion model applies text-conditioned edits
🔎 Similar Papers
No similar papers found.