3D Stylization via Large Reconstruction Model

📅 2025-04-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address weak appearance-style controllability and multi-view inconsistency in text/image-guided 3D generation, this paper proposes a zero-shot, single-step 3D model appearance stylization method. Without fine-tuning or optimization, it injects CLIP-extracted style features from a reference image directionally into the cross-layer attention modules of a pre-trained large-scale 3D reconstruction model (e.g., MVDiffusion), explicitly leveraging its implicitly encoded appearance representations for style transfer. The key contribution is the first discovery and exploitation of the implicit global-appearance modeling capability embedded in the attention mechanisms of 3D reconstruction models, enhanced by spatial alignment and CLIP-semantic guidance to ensure multi-view consistency. Experiments demonstrate that our method achieves state-of-the-art performance in visual quality, multi-view consistency, and inference speed, significantly outperforming existing style transfer and 3D editing approaches.

Technology Category

Application Category

📝 Abstract
With the growing success of text or image guided 3D generators, users demand more control over the generation process, appearance stylization being one of them. Given a reference image, this requires adapting the appearance of a generated 3D asset to reflect the visual style of the reference while maintaining visual consistency from multiple viewpoints. To tackle this problem, we draw inspiration from the success of 2D stylization methods that leverage the attention mechanisms in large image generation models to capture and transfer visual style. In particular, we probe if large reconstruction models, commonly used in the context of 3D generation, has a similar capability. We discover that the certain attention blocks in these models capture the appearance specific features. By injecting features from a visual style image to such blocks, we develop a simple yet effective 3D appearance stylization method. Our method does not require training or test time optimization. Through both quantitative and qualitative evaluations, we demonstrate that our approach achieves superior results in terms of 3D appearance stylization, significantly improving efficiency while maintaining high-quality visual outcomes.
Problem

Research questions and friction points this paper is trying to address.

Control appearance stylization in 3D generation
Transfer visual style from reference image to 3D asset
Achieve efficient 3D stylization without training or optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages attention mechanisms in large models
Injects style features into reconstruction model blocks
Requires no training or optimization during testing
🔎 Similar Papers
No similar papers found.