Self-Evolving 3D Scene Generation from a Single Image

📅 2025-12-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenging problem of reconstructing large-scale, geometrically accurate, and texture-complete 3D scenes from a single input image. We propose the first training-free, self-evolving single-image-to-3D framework that transcends conventional object-centric paradigms to enable large-scale scene reconstruction. Our method synergistically integrates geometric reasoning from 3D generative models with visual priors from video diffusion models, establishing a three-stage cross-domain iterative optimization pipeline: (i) spatial-prior-guided coarse mesh initialization, (ii) vision-guided fine-grained 3D mesh generation, and (iii) spatially constrained novel-view synthesis with inter-view consistency regularization. By orchestrating multi-model collaborative inference and joint 2D/3D cross-domain optimization, our approach significantly improves geometric stability, inter-view texture consistency, and occluded-region completion. The output is a high-fidelity, render-ready triangular mesh.

Technology Category

Application Category

📝 Abstract
Generating high-quality, textured 3D scenes from a single image remains a fundamental challenge in vision and graphics. Recent image-to-3D generators recover reasonable geometry from single views, but their object-centric training limits generalization to complex, large-scale scenes with faithful structure and texture. We present EvoScene, a self-evolving, training-free framework that progressively reconstructs complete 3D scenes from single images. The key idea is combining the complementary strengths of existing models: geometric reasoning from 3D generation models and visual knowledge from video generation models. Through three iterative stages--Spatial Prior Initialization, Visual-guided 3D Scene Mesh Generation, and Spatial-guided Novel View Generation--EvoScene alternates between 2D and 3D domains, gradually improving both structure and appearance. Experiments on diverse scenes demonstrate that EvoScene achieves superior geometric stability, view-consistent textures, and unseen-region completion compared to strong baselines, producing ready-to-use 3D meshes for practical applications.
Problem

Research questions and friction points this paper is trying to address.

Generating complete 3D scenes from single images with accurate geometry and texture.
Overcoming object-centric limitations to handle complex, large-scale scene reconstruction.
Integrating geometric reasoning and visual knowledge for improved structure and appearance.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-evolving framework combines 3D and video models
Iterative stages alternate between 2D and 3D domains
Training-free approach progressively improves structure and texture
🔎 Similar Papers
No similar papers found.