🤖 AI Summary
To address the keyframe selection challenge in real-time 3D reconstruction for dynamic scenes, this paper proposes an adaptive keyframe filtering method that alleviates perceptual data bottlenecks while improving reconstruction quality and efficiency. The method innovatively integrates photometric error and structural similarity (SSIM) into an information-theoretic evaluation module and incorporates a momentum-based mechanism to dynamically adjust the selection threshold, enabling adaptive responsiveness to scene motion intensity. It is designed for seamless integration into mainstream 3D reconstruction frameworks such as Spann3r and CUT3R. Experimental results demonstrate significant improvements in reconstruction accuracy and temporal stability over baseline strategies—including fixed-interval and uniform sampling—under diverse dynamic conditions. Ablation studies quantitatively validate the individual contributions of both the error-aware assessment module and the momentum-driven threshold update mechanism.
📝 Abstract
In this paper, we propose an adaptive keyframe selection method for improved 3D scene reconstruction in dynamic environments. The proposed method integrates two complementary modules: an error-based selection module utilizing photometric and structural similarity (SSIM) errors, and a momentum-based update module that dynamically adjusts keyframe selection thresholds according to scene motion dynamics. By dynamically curating the most informative frames, our approach addresses a key data bottleneck in real-time perception. This allows for the creation of high-quality 3D world representations from a compressed data stream, a critical step towards scalable robot learning and deployment in complex, dynamic environments. Experimental results demonstrate significant improvements over traditional static keyframe selection strategies, such as fixed temporal intervals or uniform frame skipping. These findings highlight a meaningful advancement toward adaptive perception systems that can dynamically respond to complex and evolving visual scenes. We evaluate our proposed adaptive keyframe selection module on two recent state-of-the-art 3D reconstruction networks, Spann3r and CUT3R, and observe consistent improvements in reconstruction quality across both frameworks. Furthermore, an extensive ablation study confirms the effectiveness of each individual component in our method, underlining their contribution to the overall performance gains.