3D Ground Truth Reconstruction from Multi-Camera Annotations Using UKF

๐Ÿ“… 2025-11-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Addressing the challenge of reconstructing high-fidelity 3D ground truth from multi-camera 2D annotations (bounding boxes/keypoints), this paper proposes a fully automatic 3D reconstruction framework. Leveraging multi-view geometry and nonlinear filtering, it integrates unscented Kalman filtering (UKF) for state estimation, homography-based projection, single-object tracking, and cross-camera 2D observation fusionโ€”without requiring depth sensors or planar ground assumptions. The method robustly estimates object 3D position, orientation, and complete shape. To our knowledge, it is the first end-to-end approach to generate multi-camera 3D ground truth solely from 2D annotations, effectively handling occlusions and supporting arbitrary camera topologies. Evaluated on CMC, WildTrack, and Panoptic datasets, it achieves significantly higher 3D localization accuracy than state-of-the-art methods. This enables scalable, high-fidelity 3D supervision for applications such as autonomous driving and intelligent surveillance.

Technology Category

Application Category

๐Ÿ“ Abstract
Accurate 3D ground truth estimation is critical for applications such as autonomous navigation, surveillance, and robotics. This paper introduces a novel method that uses an Unscented Kalman Filter (UKF) to fuse 2D bounding box or pose keypoint ground truth annotations from multiple calibrated cameras into accurate 3D ground truth. By leveraging human-annotated ground-truth 2D, our proposed method, a multi-camera single-object tracking algorithm, transforms 2D image coordinates into robust 3D world coordinates through homography-based projection and UKF-based fusion. Our proposed algorithm processes multi-view data to estimate object positions and shapes while effectively handling challenges such as occlusion. We evaluate our method on the CMC, Wildtrack, and Panoptic datasets, demonstrating high accuracy in 3D localization compared to the available 3D ground truth. Unlike existing approaches that provide only ground-plane information, our method also outputs the full 3D shape of each object. Additionally, the algorithm offers a scalable and fully automatic solution for multi-camera systems using only 2D image annotations.
Problem

Research questions and friction points this paper is trying to address.

Fusing 2D multi-camera annotations into accurate 3D ground truth
Estimating full 3D object positions and shapes from 2D data
Handling occlusion challenges in multi-camera tracking systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

UKF fuses multi-camera 2D annotations into 3D
Homography projection converts 2D coordinates to 3D
Algorithm outputs full 3D shapes from 2D inputs
๐Ÿ”Ž Similar Papers
No similar papers found.
Linh Van Ma
Linh Van Ma
Gwangju Institute of Science and Technology
Computer VisionRandom Finite SetsMulti-object tracking
U
Unse Fatima
Department of Electrical Engineering and Computer Science, GIST, Korea
T
Tepy Sokun Chriv
Department of Electrical Engineering and Computer Science, GIST, Korea
H
Haroon Imran
Department of Electrical Engineering and Computer Science, GIST, Korea
Moongu Jeon
Moongu Jeon
Gwangju Institute of Science and Technology
Artificial intelligenceMachine learningComputer visionAutonomous driving