🤖 AI Summary
Monocular 3D perception for railway L4 autonomous trains faces severe challenges in ultra-long-range scenarios (>1 km), where conventional methods suffer from poor depth estimation accuracy and low recall for small, distant objects (e.g., track intrusions and pedestrians).
Method: We propose a LiDAR-supervised, segmented monocular long-range 3D detection framework featuring dual detection heads—short-range (≤250 m) and long-range (>250 m)—to overcome monocular depth ambiguity at distance. A YOLOv9-enhanced 2.5D detector is jointly optimized with a LiDAR-guided depth estimation network.
Contribution/Results: On the OSDaR23 dataset, our method achieves the first robust monocular 3D detection within 250 m, significantly improving recall for small distant objects. It delivers the first production-viable, safety-critical long-range monocular 3D perception solution for railway automation.
📝 Abstract
Railway systems, particularly in Germany, require high levels of automation to address legacy infrastructure challenges and increase train traffic safely. A key component of automation is robust long-range perception, essential for early hazard detection, such as obstacles at level crossings or pedestrians on tracks. Unlike automotive systems with braking distances of ~70 meters, trains require perception ranges exceeding 1 km. This paper presents an deep-learning-based approach for long-range 3D object detection tailored for autonomous trains. The method relies solely on monocular images, inspired by the Faraway-Frustum approach, and incorporates LiDAR data during training to improve depth estimation. The proposed pipeline consists of four key modules: (1) a modified YOLOv9 for 2.5D object detection, (2) a depth estimation network, and (3-4) dedicated short- and long-range 3D detection heads. Evaluations on the OSDaR23 dataset demonstrate the effectiveness of the approach in detecting objects up to 250 meters. Results highlight its potential for railway automation and outline areas for future improvement.