RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and Tracking

📅 2025-06-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses real-time, robust 6D object pose estimation and tracking from RGB-only video sequences in dynamic scenes, without depth sensors. Methodologically, it proposes an end-to-end framework featuring: (1) a novel depth inference mechanism based on binary search and differentiable rendering, eliminating reliance on ground-truth depth maps; (2) integration of XMem for high-accuracy 2D feature tracking, coupled with Kalman filtering and a state machine to model motion priors and occlusion recovery; and (3) a scale-adaptive optimization module enabling online scale recovery for CAD models of unknown scale. Evaluated on standard benchmarks—including LM-O and T-LESS—the method achieves state-of-the-art accuracy while operating at over 30 FPS. It demonstrates significantly improved tracking stability and robustness under challenging conditions such as rapid motion, severe occlusion, and scale variation.

Technology Category

Application Category

📝 Abstract
We introduce a robust framework, RGBTrack, for real-time 6D pose estimation and tracking that operates solely on RGB data, thereby eliminating the need for depth input for such dynamic and precise object pose tracking tasks. Building on the FoundationPose architecture, we devise a novel binary search strategy combined with a render-and-compare mechanism to efficiently infer depth and generate robust pose hypotheses from true-scale CAD models. To maintain stable tracking in dynamic scenarios, including rapid movements and occlusions, RGBTrack integrates state-of-the-art 2D object tracking (XMem) with a Kalman filter and a state machine for proactive object pose recovery. In addition, RGBTrack's scale recovery module dynamically adapts CAD models of unknown scale using an initial depth estimate, enabling seamless integration with modern generative reconstruction techniques. Extensive evaluations on benchmark datasets demonstrate that RGBTrack's novel depth-free approach achieves competitive accuracy and real-time performance, making it a promising practical solution candidate for application areas including robotics, augmented reality, and computer vision. The source code for our implementation will be made publicly available at https://github.com/GreatenAnoymous/RGBTrack.git.
Problem

Research questions and friction points this paper is trying to address.

Real-time 6D pose estimation without depth input
Robust tracking under rapid movements and occlusions
Dynamic scale adaptation for unknown CAD models
Innovation

Methods, ideas, or system contributions that make the work stand out.

RGB-only 6D pose estimation without depth
Binary search with render-and-compare mechanism
Dynamic scale recovery for CAD models
🔎 Similar Papers
No similar papers found.
T
Teng Guo
Department of Computer Science, Rutgers, the State University of New Jersey, Piscataway, NJ, USA
Jingjin Yu
Jingjin Yu
Associate Professor of Comp. Sci., Rutgers Univ. at New Brunswick, Roboticist
Algorithmic Foundations for Robotics