D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video

📅 2024-06-14
🏛️ arXiv.org
📈 Citations: 2
Influential: 1
📄 PDF
🤖 AI Summary
This paper addresses dynamic novel-view synthesis in monocular videos of non-rigidly deforming scenes. To this end, we propose an efficient and high-fidelity reconstruction framework. Our method explicitly separates static and dynamic regions in both geometry and appearance via a time-varying neural point cloud representation. We introduce a novel explicit initialization strategy that jointly leverages monocular depth estimation and instance segmentation priors, significantly accelerating optimization while improving geometric and textural fidelity. Furthermore, we integrate hash-encoded feature grids, differentiable rasterization, and neural rendering to enable fast optimization and real-time rendering. Evaluated on standard monocular benchmarks, our approach achieves state-of-the-art performance in PSNR and SSIM. The code and data are publicly available.

Technology Category

Application Category

📝 Abstract
Dynamic reconstruction and spatiotemporal novel-view synthesis of non-rigidly deforming scenes recently gained increased attention. While existing work achieves impressive quality and performance on multi-view or teleporting camera setups, most methods fail to efficiently and faithfully recover motion and appearance from casual monocular captures. This paper contributes to the field by introducing a new method for dynamic novel view synthesis from monocular video, such as casual smartphone captures. Our approach represents the scene as a $ extit{dynamic neural point cloud}$, an implicit time-conditioned point distribution that encodes local geometry and appearance in separate hash-encoded neural feature grids for static and dynamic regions. By sampling a discrete point cloud from our model, we can efficiently render high-quality novel views using a fast differentiable rasterizer and neural rendering network. Similar to recent work, we leverage advances in neural scene analysis by incorporating data-driven priors like monocular depth estimation and object segmentation to resolve motion and depth ambiguities originating from the monocular captures. In addition to guiding the optimization process, we show that these priors can be exploited to explicitly initialize our scene representation to drastically improve optimization speed and final image quality. As evidenced by our experimental evaluation, our dynamic point cloud model not only enables fast optimization and real-time frame rates for interactive applications, but also achieves competitive image quality on monocular benchmark sequences. Our code and data are available online: https://moritzkappel.github.io/projects/dnpc/.
Problem

Research questions and friction points this paper is trying to address.

Dynamic reconstruction of non-rigid scenes from monocular video.
Efficient novel-view synthesis using dynamic neural point clouds.
Resolving motion and depth ambiguities in monocular captures.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic neural point cloud for scene representation
Separate hash-encoded grids for static and dynamic regions
Monocular depth and object segmentation priors
🔎 Similar Papers
No similar papers found.
M
Moritz Kappel
Computer Graphics Lab, TU Braunschweig, Germany
Florian Hahlbohm
Florian Hahlbohm
TU Braunschweig
View SynthesisImage-Based RenderingNeural RenderingReal-Time RenderingGaussian Splatting
T
Timon Scholz
Computer Graphics Lab, TU Braunschweig, Germany
S
Susana Castillo
Computer Graphics Lab, TU Braunschweig, Germany
C
C. Theobalt
Max Planck Institute for Informatics, Saarland Informatics Campus, Germany
M
Martin Eisemann
Computer Graphics Lab, TU Braunschweig, Germany
Vladislav Golyanik
Vladislav Golyanik
Senior Researcher, MPI for Informatics
3D reconstructionneural renderinggenerative modelsquantum computer vision
M
M. Magnor
Computer Graphics Lab, TU Braunschweig, Germany