🤖 AI Summary
Addressing the challenge of balancing speed, geometric detail, and reconstruction fidelity in single-image 3D reconstruction, this paper proposes LatentDreamer—a novel framework built upon a pretrained variational autoencoder (VAE) that maps 3D geometry into a compact latent space for efficient, high-fidelity 3D generation from a single image. Its key contributions are: (1) a learnable latent feature representation that drastically reduces 3D modeling complexity; and (2) a progressive, serialized generation pipeline—coarse-to-fine geometry followed by texture synthesis—that jointly ensures structural integrity and surface realism. With minimal fine-tuning, LatentDreamer reconstructs high-quality 3D models in approximately 70 seconds per image. It achieves state-of-the-art performance on standard metrics including FID and Chamfer distance, significantly advancing the practicality and scalability of single-image 3D generation.
📝 Abstract
3D assets are essential in the digital age. While automatic 3D generation, such as image-to-3d, has made significant strides in recent years, it often struggles to achieve fast, detailed, and high-fidelity generation simultaneously. In this work, we introduce LatentDreamer, a novel framework for generating 3D objects from single images. The key to our approach is a pre-trained variational autoencoder that maps 3D geometries to latent features, which greatly reducing the difficulty of 3D generation. Starting from latent features, the pipeline of LatentDreamer generates coarse geometries, refined geometries, and realistic textures sequentially. The 3D objects generated by LatentDreamer exhibit high fidelity to the input images, and the entire generation process can be completed within a short time (typically in 70 seconds). Extensive experiments show that with only a small amount of training, LatentDreamer demonstrates competitive performance compared to contemporary approachs.