🤖 AI Summary
This work addresses the challenge of efficient online 3D multi-object tracking and pose estimation using only multi-view monocular cameras, without relying on costly 3D annotations or computationally intensive deep models. The authors propose a fast online algorithm grounded in Bayesian optimal multi-object tracking filters, which takes as input only the outputs of a pre-trained 2D detector. By leveraging multi-camera geometric fusion and online multi-view association optimization, the method jointly infers 3D trajectories and object poses. Notably, it requires no 3D training data and remains robust under dynamic camera disconnections and reconnections. The approach achieves significantly faster runtime than existing methods while maintaining high accuracy, demonstrating its practicality and efficiency in real-world multi-camera systems.
📝 Abstract
This paper proposes a fast and online method for jointly performing 3D multi-object tracking and pose estimation using multiple monocular cameras. Our algorithm requires only 2D bounding box and pose detections, eliminating the need for costly 3D training data or computationally expensive deep learning models. Our solution is an efficient implementation of a Bayes-optimal multi-object tracking filter, enhancing computational efficiency while maintaining accuracy. We demonstrate that our algorithm is significantly faster than state-of-the-art methods without compromising accuracy, using only publicly available pre-trained 2D detection models. We also illustrate the robust performance of our algorithm in scenarios where multiple cameras are intermittently disconnected or reconnected during operation.