3D Ground Truth Reconstruction from Multi-Camera Annotations Using UKF

📅 2025-11-18

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Addressing the challenge of reconstructing high-fidelity 3D ground truth from multi-camera 2D annotations (bounding boxes/keypoints), this paper proposes a fully automatic 3D reconstruction framework. Leveraging multi-view geometry and nonlinear filtering, it integrates unscented Kalman filtering (UKF) for state estimation, homography-based projection, single-object tracking, and cross-camera 2D observation fusion—without requiring depth sensors or planar ground assumptions. The method robustly estimates object 3D position, orientation, and complete shape. To our knowledge, it is the first end-to-end approach to generate multi-camera 3D ground truth solely from 2D annotations, effectively handling occlusions and supporting arbitrary camera topologies. Evaluated on CMC, WildTrack, and Panoptic datasets, it achieves significantly higher 3D localization accuracy than state-of-the-art methods. This enables scalable, high-fidelity 3D supervision for applications such as autonomous driving and intelligent surveillance.

Technology Category

Application Category

📝 Abstract

Accurate 3D ground truth estimation is critical for applications such as autonomous navigation, surveillance, and robotics. This paper introduces a novel method that uses an Unscented Kalman Filter (UKF) to fuse 2D bounding box or pose keypoint ground truth annotations from multiple calibrated cameras into accurate 3D ground truth. By leveraging human-annotated ground-truth 2D, our proposed method, a multi-camera single-object tracking algorithm, transforms 2D image coordinates into robust 3D world coordinates through homography-based projection and UKF-based fusion. Our proposed algorithm processes multi-view data to estimate object positions and shapes while effectively handling challenges such as occlusion. We evaluate our method on the CMC, Wildtrack, and Panoptic datasets, demonstrating high accuracy in 3D localization compared to the available 3D ground truth. Unlike existing approaches that provide only ground-plane information, our method also outputs the full 3D shape of each object. Additionally, the algorithm offers a scalable and fully automatic solution for multi-camera systems using only 2D image annotations.

Problem

Research questions and friction points this paper is trying to address.

Fusing 2D multi-camera annotations into accurate 3D ground truth

Estimating full 3D object positions and shapes from 2D data

Handling occlusion challenges in multi-camera tracking systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

UKF fuses multi-camera 2D annotations into 3D

Homography projection converts 2D coordinates to 3D

Algorithm outputs full 3D shapes from 2D inputs

🔎 Similar Papers

No similar papers found.