🤖 AI Summary
Neural radiance field (NeRF) modeling for unconstrained real-world scenes—such as tourist photo collections—remains challenging due to the violation of the closed-world assumption and lack of semantic priors.
Method: This work breaks this assumption by integrating pre-trained CNN/Vision Transformer semantic priors into the K-Planes planar scene representation framework. We propose a prior-guided alternating optimization strategy: building upon voxel-based representations, we jointly optimize geometry and appearance through multi-stage feature distillation and rendering loss.
Contribution/Results: Our method achieves significant improvements in novel-view synthesis quality on both synthetic and real-world outdoor imagery, yielding richer geometric and textural details. Quantitatively, it consistently outperforms state-of-the-art baselines across PSNR and SSIM metrics. This establishes a new paradigm for efficient, high-fidelity NeRF modeling in open-domain scenes.
📝 Abstract
Modeling large scenes from unconstrained images has proven to be a major challenge in computer vision. Existing methods tackling in-the-wild scene modeling operate in closed-world settings, where no conditioning on priors acquired from real-world images is present. We propose RefinedFields, which is, to the best of our knowledge, the first method leveraging pre-trained models to improve in-the-wild scene modeling. We employ pre-trained networks to refine K-Planes representations via optimization guidance using an alternating training procedure. We carry out extensive experiments and verify the merit of our method on synthetic data and real tourism photo collections. RefinedFields enhances rendered scenes with richer details and improves upon its base representation on the task of novel view synthesis in the wild. Our project page can be found at https://refinedfields.github.io.