BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion

📅 2024-01-30
🏛️ International Joint Conference on Artificial Intelligence
📈 Citations: 9
Influential: 0
📄 PDF
🤖 AI Summary
To address the longstanding trade-off between quality and efficiency in text-to-3D generation, this paper proposes a plug-and-play, efficient 3D refinement framework that elevates coarse, feedforward-generated 3D assets to high-fidelity levels within seconds. Methodologically, we introduce the first 3D model distillation mechanism, design a multi-view-aware Score Distillation Sampling (SDS) loss, and incorporate joint guidance from normal maps and text prompts—thereby overcoming the “Janus dilemma” of SDS, where geometric accuracy and rendering speed are conventionally at odds. The framework supports diverse differentiable 3D representations—including NeRF and Gaussian Splatting—without requiring retraining. Extensive experiments demonstrate consistent superiority over state-of-the-art baselines across geometric completeness, texture realism, and inference speed, achieving synergistic improvements in both quality and efficiency.

Technology Category

Application Category

📝 Abstract
Witnessing the evolution of text-to-image diffusion models, significant strides have been made in text-to-3D generation. Currently, two primary paradigms dominate the field of text-to-3D: the feed-forward generation solutions, capable of swiftly producing 3D assets but often yielding coarse results, and the Score Distillation Sampling (SDS) based solutions, known for generating high-fidelity 3D assets albeit at a slower pace. The synergistic integration of these methods holds substantial promise for advancing 3D generation techniques. In this paper, we present BoostDream, a highly efficient plug-and-play 3D refining method designed to transform coarse 3D assets into high-quality. The BoostDream framework comprises three distinct processes: (1) We introduce 3D model distillation that fits differentiable representations from the 3D assets obtained through feed-forward generation. (2) A novel multi-view SDS loss is designed, which utilizes a multi-view aware 2D diffusion model to refine the 3D assets. (3) We propose to use prompt and multi-view consistent normal maps as guidance in refinement. Our extensive experiment is conducted on different differentiable 3D representations, revealing that BoostDream excels in generating high-quality 3D assets rapidly, overcoming the Janus problem compared to conventional SDS-based methods. This breakthrough signifies a substantial advancement in both the efficiency and quality of 3D generation processes.
Problem

Research questions and friction points this paper is trying to address.

Refines coarse 3D assets into high-quality ones efficiently
Integrates feed-forward and SDS methods for better 3D generation
Overcomes Janus problem and improves speed and quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Plug-and-play refining method for 3D assets
Multi-view SDS loss with 2D diffusion model
Prompt and normal maps guidance in refinement
🔎 Similar Papers
No similar papers found.