🤖 AI Summary
Existing methods suffer from domain shift when mapping 2D features to 3D, degrading rendering fidelity. This paper proposes a 3D-aware 2D latent representation framework comprising three stages: correspondence-aware self-encoding, latent radiance field (LRF) construction, and VAE-RF co-alignment. We theoretically establish—and empirically verify—that photo-realistic radiance fields can be constructed solely from 2D latent representations. We introduce a 3D geometrically aware paradigm for modeling the 2D latent space and a novel VAE-RF alignment mechanism, augmented by cross-view consistency regularization. Our approach significantly outperforms state-of-the-art methods in synthesis quality and cross-dataset generalization, achieving high-fidelity novel-view synthesis across diverse indoor and outdoor scenes.
📝 Abstract
Latent 3D reconstruction has shown great promise in empowering 3D semantic understanding and 3D generation by distilling 2D features into the 3D space. However, existing approaches struggle with the domain gap between 2D feature space and 3D representations, resulting in degraded rendering performance. To address this challenge, we propose a novel framework that integrates 3D awareness into the 2D latent space. The framework consists of three stages: (1) a correspondence-aware autoencoding method that enhances the 3D consistency of 2D latent representations, (2) a latent radiance field (LRF) that lifts these 3D-aware 2D representations into 3D space, and (3) a VAE-Radiance Field (VAE-RF) alignment strategy that improves image decoding from the rendered 2D representations. Extensive experiments demonstrate that our method outperforms the state-of-the-art latent 3D reconstruction approaches in terms of synthesis performance and cross-dataset generalizability across diverse indoor and outdoor scenes. To our knowledge, this is the first work showing the radiance field representations constructed from 2D latent representations can yield photorealistic 3D reconstruction performance.