SmallGS: Gaussian Splatting-based Camera Pose Estimation for Small-Baseline Videos

📅 2025-04-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Camera pose estimation in small-baseline dynamic videos—common in social-media footage—suffers from ambiguous features, accumulated pose drift, and insufficient triangulation constraints. Method: We propose an explicit-implicit collaborative pose optimization framework based on Gaussian Splatting: leveraging the first-frame Gaussian scene representation as a stable geometric prior, we freeze its parameters and differentiably rasterize DINOv2 visual feature maps for direct pose optimization in the rendered feature space—bypassing conventional feature matching and strong disparity assumptions. Contribution/Results: This is the first work to adapt Gaussian Splatting to small-baseline pose estimation. On the TUM-Dynamics dataset, our method significantly outperforms MonST3R and DORID-SLAM, achieving high-accuracy and robust camera pose estimation for small-baseline video sequences.

Technology Category

Application Category

📝 Abstract
Dynamic videos with small baseline motions are ubiquitous in daily life, especially on social media. However, these videos present a challenge to existing pose estimation frameworks due to ambiguous features, drift accumulation, and insufficient triangulation constraints. Gaussian splatting, which maintains an explicit representation for scenes, provides a reliable novel view rasterization when the viewpoint change is small. Inspired by this, we propose SmallGS, a camera pose estimation framework that is specifically designed for small-baseline videos. SmallGS optimizes sequential camera poses using Gaussian splatting, which reconstructs the scene from the first frame in each video segment to provide a stable reference for the rest. The temporal consistency of Gaussian splatting within limited viewpoint differences reduced the requirement of sufficient depth variations in traditional camera pose estimation. We further incorporate pretrained robust visual features, e.g. DINOv2, into Gaussian splatting, where high-dimensional feature map rendering enhances the robustness of camera pose estimation. By freezing the Gaussian splatting and optimizing camera viewpoints based on rasterized features, SmallGS effectively learns camera poses without requiring explicit feature correspondences or strong parallax motion. We verify the effectiveness of SmallGS in small-baseline videos in TUM-Dynamics sequences, which achieves impressive accuracy in camera pose estimation compared to MonST3R and DORID-SLAM for small-baseline videos in dynamic scenes. Our project page is at: https://yuxinyao620.github.io/SmallGS
Problem

Research questions and friction points this paper is trying to address.

Estimating camera poses in small-baseline dynamic videos
Addressing ambiguous features and drift in pose estimation
Enhancing robustness with Gaussian splatting and visual features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Gaussian splatting for stable scene reference
Incorporates DINOv2 features for robustness
Optimizes poses without feature correspondences