Predicting 3D representations for Dynamic Scenes

📅 2025-01-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing monocular dynamic video modeling methods produce only 2D frames and lack generalization to unseen scenes. Method: We propose the first egocentric, unbounded tri-plane representation for 4D scene modeling from monocular video. To address temporal coherence and geometric-semantic joint learning without explicit geometry supervision, we introduce a 4D-aware Transformer for self-supervised temporal feature aggregation, integrated with dynamic radiance field optimization over implicit tri-plane representations. Contribution/Results: Our approach achieves state-of-the-art performance on the NVIDIA Dynamic Scenes dataset, demonstrates strong cross-scene generalization, and—crucially—enables large-scale, self-supervised 4D reconstruction of the physical world directly from monocular video, marking the first such capability.

Technology Category

Application Category

📝 Abstract

We present a novel framework for dynamic radiance field prediction given monocular video streams. Unlike previous methods that primarily focus on predicting future frames, our method goes a step further by generating explicit 3D representations of the dynamic scene. The framework builds on two core designs. First, we adopt an ego-centric unbounded triplane to explicitly represent the dynamic physical world. Second, we develop a 4D-aware transformer to aggregate features from monocular videos to update the triplane. Coupling these two designs enables us to train the proposed model with large-scale monocular videos in a self-supervised manner. Our model achieves top results in dynamic radiance field prediction on NVIDIA dynamic scenes, demonstrating its strong performance on 4D physical world modeling. Besides, our model shows a superior generalizability to unseen scenarios. Notably, we find that our approach emerges capabilities for geometry and semantic learning.

Problem

Research questions and friction points this paper is trying to address.

3D scene prediction

lighting variation

shape and meaning understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Light Field Prediction

4D-aware Intelligent Tool

Self-trained Monocular Video Analysis

🔎 Similar Papers

No similar papers found.

Authors to Follow