CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of unifying cross-modal style transfer across 2D images, videos, 3D objects, and 4D dynamic scenes—a task previously unexplored in Gaussian Splatting (GS)-based frameworks. We propose the first GS-native, end-to-end style transfer framework supporting both text- and image-guided stylization, directly optimizing geometry deformation and appearance rendering jointly on GS primitives. Unlike conventional color-space transformations, our method leverages differentiable GS rendering, CLIP-based cross-modal semantic alignment, and gradient-driven primitive-level optimization—requiring no large foundation models or retraining. Crucially, we extend style transfer to the multi-modal 3D/4D space represented by GS, ensuring temporal coherence for videos and geometric fidelity for 3D/4D content without introducing additional parameters. Extensive experiments demonstrate state-of-the-art performance across all modalities, achieving high-fidelity, temporally consistent, and geometrically accurate stylization.

Technology Category

Application Category

📝 Abstract
Gaussian Splatting (GS) has recently emerged as an efficient representation for rendering 3D scenes from 2D images and has been extended to images, videos, and dynamic 4D content. However, applying style transfer to GS-based representations, especially beyond simple color changes, remains challenging. In this work, we introduce CLIPGaussians, the first unified style transfer framework that supports text- and image-guided stylization across multiple modalities: 2D images, videos, 3D objects, and 4D scenes. Our method operates directly on Gaussian primitives and integrates into existing GS pipelines as a plug-in module, without requiring large generative models or retraining from scratch. CLIPGaussians approach enables joint optimization of color and geometry in 3D and 4D settings, and achieves temporal coherence in videos, while preserving a model size. We demonstrate superior style fidelity and consistency across all tasks, validating CLIPGaussians as a universal and efficient solution for multimodal style transfer.
Problem

Research questions and friction points this paper is trying to address.

Enabling style transfer for Gaussian Splatting representations
Supporting text- and image-guided stylization across 2D/3D/4D modalities
Achieving style fidelity without large generative models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified style transfer for multiple modalities
Plug-in module for Gaussian Splatting pipelines
Joint optimization of color and geometry
🔎 Similar Papers
2024-07-01arXiv.orgCitations: 3