PanoDreamer: Consistent Text to 360-Degree Scene Generation

📅 2025-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D scene generation methods—driven by text or reference images—struggle to simultaneously ensure geometric consistency and high-fidelity texture synthesis, particularly when extrapolating to full 360° panoramic scenes, where structural distortions and inter-view texture inconsistencies frequently arise. To address this, we propose the Warp-Refine pipeline coupled with an LLM-driven multi-view consistency optimization mechanism. Our approach is the first to seamlessly integrate 3D Gaussian Splatting into an end-to-end 360° scene generation framework supporting joint text-and-image conditioning. It comprises three core components: deformation-guided coarse initialization, LLM-scheduled collaborative multi-view refinement, and iterative point cloud expansion. This enables geometrically accurate, texture-coherent, and interactive panoramic 3D scenes. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches in both reconstruction fidelity and inter-view consistency.

Technology Category

Application Category

📝 Abstract
Automatically generating a complete 3D scene from a text description, a reference image, or both has significant applications in fields like virtual reality and gaming. However, current methods often generate low-quality textures and inconsistent 3D structures. This is especially true when extrapolating significantly beyond the field of view of the reference image. To address these challenges, we propose PanoDreamer, a novel framework for consistent, 3D scene generation with flexible text and image control. Our approach employs a large language model and a warp-refine pipeline, first generating an initial set of images and then compositing them into a 360-degree panorama. This panorama is then lifted into 3D to form an initial point cloud. We then use several approaches to generate additional images, from different viewpoints, that are consistent with the initial point cloud and expand/refine the initial point cloud. Given the resulting set of images, we utilize 3D Gaussian Splatting to create the final 3D scene, which can then be rendered from different viewpoints. Experiments demonstrate the effectiveness of PanoDreamer in generating high-quality, geometrically consistent 3D scenes.
Problem

Research questions and friction points this paper is trying to address.

Generating consistent 3D scenes from text or images
Improving low-quality textures and 3D inconsistencies
Extrapolating beyond reference image field of view
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates 360-degree panorama from text/image
Uses warp-refine pipeline for consistency
Applies 3D Gaussian Splatting for final scene
🔎 Similar Papers
No similar papers found.