🤖 AI Summary
To address the high cost and scalability limitations of multi-sensor setups for monocular 3D vehicle annotation, this paper proposes a lightweight interactive method requiring only a single RGB image and camera intrinsics. Users provide approximately ten clicks on key vehicle parts—such as wheel centers, logos, and symmetric structures—to jointly estimate an 8-DOF pose and dimensions, inherently resolving scale ambiguity. Our approach innovatively integrates part-level geometric priors (e.g., axle parallelism, body symmetry) with probabilistic size priors, explicitly modeling and mitigating both scale and unobserved-dimensional ambiguities within a monocular framework—enabling high-precision 9-DOF 3D bounding box localization. An alternating coordinate-descent optimization alternates between PnP and constrained least-squares subproblems. Evaluated on KITTI and Cityscapes3D, our method achieves annotation accuracy comparable to LiDAR- or stereo-based benchmarks, while drastically reducing annotation cost and enabling efficient, large-scale expansion of high-quality monocular 3D annotations.
📝 Abstract
Many existing methods for 3D cuboid annotation of vehicles rely on expensive and carefully calibrated camera-LiDAR or stereo setups, limiting their accessibility for large-scale data collection. We introduce ToosiCubix, a simple yet powerful approach for annotating ground-truth cuboids using only monocular images and intrinsic camera parameters. Our method requires only about 10 user clicks per vehicle, making it highly practical for adding 3D annotations to existing datasets originally collected without specialized equipment. By annotating specific features (e.g., wheels, car badge, symmetries) across different vehicle parts, we accurately estimate each vehicle's position, orientation, and dimensions up to a scale ambiguity (8 DoF). The geometric constraints are formulated as an optimization problem, which we solve using a coordinate descent strategy, alternating between Perspective-n-Points (PnP) and least-squares subproblems. To handle common ambiguities such as scale and unobserved dimensions, we incorporate probabilistic size priors, enabling 9 DoF cuboid placements. We validate our annotations against the KITTI and Cityscapes3D datasets, demonstrating that our method offers a cost-effective and scalable solution for high-quality 3D cuboid annotation.