RGB-Only Supervised Camera Parameter Optimization in Dynamic Scenes

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Addressing the challenge of camera parameter optimization from a single RGB video sequence in dynamic scenes—without motion masks or priors—this paper proposes an end-to-end framework. Methodologically, it introduces: (1) sparse spatiotemporal hinge relation modeling via patch-wise tracking filters; (2) an outlier-aware joint optimization scheme integrating Softplus robust loss with convex optimization; and (3) a two-stage optimization strategy balancing stability and efficiency. Evaluated on five dynamic datasets, the method significantly improves camera pose estimation accuracy and convergence speed. Simultaneously, it enhances 4D reconstruction quality—boosting both 3D geometric consistency and 2D rendering fidelity. These advances establish a new paradigm for unsupervised dynamic SLAM and neural rendering, eliminating reliance on motion priors or explicit segmentation while maintaining robustness to scene dynamics.

Technology Category

Application Category

📝 Abstract

Although COLMAP has long remained the predominant method for camera parameter optimization in static scenes, it is constrained by its lengthy runtime and reliance on ground truth (GT) motion masks for application to dynamic scenes. Many efforts attempted to improve it by incorporating more priors as supervision such as GT focal length, motion masks, 3D point clouds, camera poses, and metric depth, which, however, are typically unavailable in casually captured RGB videos. In this paper, we propose a novel method for more accurate and efficient camera parameter optimization in dynamic scenes solely supervised by a single RGB video. Our method consists of three key components: (1) Patch-wise Tracking Filters, to establish robust and maximally sparse hinge-like relations across the RGB video. (2) Outlier-aware Joint Optimization, for efficient camera parameter optimization by adaptive down-weighting of moving outliers, without reliance on motion priors. (3) A Two-stage Optimization Strategy, to enhance stability and optimization speed by a trade-off between the Softplus limits and convex minima in losses. We visually and numerically evaluate our camera estimates. To further validate accuracy, we feed the camera estimates into a 4D reconstruction method and assess the resulting 3D scenes, and rendered 2D RGB and depth maps. We perform experiments on 4 real-world datasets (NeRF-DS, DAVIS, iPhone, and TUM-dynamics) and 1 synthetic dataset (MPI-Sintel), demonstrating that our method estimates camera parameters more efficiently and accurately with a single RGB video as the only supervision.

Problem

Research questions and friction points this paper is trying to address.

Optimizing camera parameters in dynamic scenes using only RGB video

Eliminating reliance on ground truth motion masks and depth

Improving accuracy and efficiency without additional supervision priors

Innovation

Methods, ideas, or system contributions that make the work stand out.

RGB-only supervised camera optimization

Patch-wise tracking filters for hinge relations

Outlier-aware joint optimization without motion priors

🔎 Similar Papers

No similar papers found.

Authors to Follow