🤖 AI Summary
This paper addresses unsupervised monocular road segmentation by proposing a weakly supervised learning framework that eliminates the need for manual annotations. Methodologically, it jointly leverages monocular geometric priors—namely depth and surface normal estimation—with optical-flow-guided feature point tracking across frames to generate reliable weak labels. To further refine label quality and temporal stability, the framework incorporates mutual information maximization and temporal consistency constraints during joint optimization of the segmentation model. The key contributions are: (i) the first integration of geometric and motion cues for collaborative modeling in unsupervised road segmentation; and (ii) the application of information-theoretic principles to enforce cross-frame prediction consistency. Evaluated on Cityscapes, the method achieves 82.0% IoU—matching the performance of fully supervised baselines—while demonstrating superior temporal robustness.
📝 Abstract
This paper presents a fully unsupervised approach for binary road segmentation (road vs. non-road), eliminating the reliance on costly manually labeled datasets. The method leverages scene geometry and temporal cues to distinguish road from non-road regions. Weak labels are first generated from geometric priors, marking pixels above the horizon as non-road and a predefined quadrilateral in front of the vehicle as road. In a refinement stage, temporal consistency is enforced by tracking local feature points across frames and penalizing inconsistent label assignments using mutual information maximization. This enhances both precision and temporal stability. On the Cityscapes dataset, the model achieves an Intersection-over-Union (IoU) of 0.82, demonstrating high accuracy with a simple design. These findings demonstrate the potential of combining geometric constraints and temporal consistency for scalable unsupervised road segmentation in autonomous driving.