🤖 AI Summary
To address the strong reliance on manual calibration and low cross-modal association accuracy in radar-camera multi-object tracking, this paper proposes an online joint calibration and common-feature-driven multimodal tracking framework. Methodologically, it introduces the first end-to-end online extrinsic parameter estimation by leveraging structural commonalities between radar point cloud reflectivity and image texture, augmented by category-consistency constraints to refine feature matching—replacing conventional geometry-only coarse-grained association. A lightweight multimodal fusion module is further designed to jointly improve 3D localization accuracy and trajectory stability. Evaluated on real-world traffic and controlled scenarios, the method eliminates offline calibration, reduces 3D localization error by 21.3%, and improves IDF1 by 14.7%, significantly enhancing system robustness and deployment efficiency.
📝 Abstract
This paper presents a Multi-Object Tracking (MOT) framework that fuses radar and camera data to enhance tracking efficiency while minimizing manual interventions. Contrary to many studies that underutilize radar and assign it a supplementary role--despite its capability to provide accurate range/depth information of targets in a world 3D coordinate system--our approach positions radar in a crucial role. Meanwhile, this paper utilizes common features to enable online calibration to autonomously associate detections from radar and camera. The main contributions of this work include: (1) the development of a radar-camera fusion MOT framework that exploits online radar-camera calibration to simplify the integration of detection results from these two sensors, (2) the utilization of common features between radar and camera data to accurately derive real-world positions of detected objects, and (3) the adoption of feature matching and category-consistency checking to surpass the limitations of mere position matching in enhancing sensor association accuracy. To the best of our knowledge, we are the first to investigate the integration of radar-camera common features and their use in online calibration for achieving MOT. The efficacy of our framework is demonstrated by its ability to streamline the radar-camera mapping process and improve tracking precision, as evidenced by real-world experiments conducted in both controlled environments and actual traffic scenarios. Code is available at https://github.com/radar-lab/Radar_Camera_MOT