Temporally Consistent Unsupervised Segmentation for Mobile Robot Perception

📅 2025-07-29

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

To address the challenge of time-consistent semantic terrain segmentation for mobile robots operating in unstructured environments—where labeled data are scarce—this paper proposes an unsupervised video-level terrain segmentation method. The core innovation lies in the first incorporation of temporal consistency constraints into unsupervised terrain segmentation: robust superpixel-level features are extracted using foundation models (e.g., DINOv2), and cross-frame feature propagation coupled with consistency regularization drives clustering optimization to stably identify traversable regions and terrain boundaries. The method requires no human annotations. Evaluated on off-road benchmarks—including RUGD and RELLIS-3D—it achieves substantial improvements in segmentation accuracy (+8.2% mIoU) and temporal stability (+23.6% inter-frame IoU). This advances reliable perception for autonomous navigation in open, unstructured environments.

Technology Category

Application Category

📝 Abstract

Rapid progress in terrain-aware autonomous ground navigation has been driven by advances in supervised semantic segmentation. However, these methods rely on costly data collection and labor-intensive ground truth labeling to train deep models. Furthermore, autonomous systems are increasingly deployed in unrehearsed, unstructured environments where no labeled data exists and semantic categories may be ambiguous or domain-specific. Recent zero-shot approaches to unsupervised segmentation have shown promise in such settings but typically operate on individual frames, lacking temporal consistency-a critical property for robust perception in unstructured environments. To address this gap we introduce Frontier-Seg, a method for temporally consistent unsupervised segmentation of terrain from mobile robot video streams. Frontier-Seg clusters superpixel-level features extracted from foundation model backbones-specifically DINOv2-and enforces temporal consistency across frames to identify persistent terrain boundaries or frontiers without human supervision. We evaluate Frontier-Seg on a diverse set of benchmark datasets-including RUGD and RELLIS-3D-demonstrating its ability to perform unsupervised segmentation across unstructured off-road environments.

Problem

Research questions and friction points this paper is trying to address.

Unsupervised terrain segmentation lacks temporal consistency

Existing methods need costly labeled data for training

Ambiguous semantic categories in unstructured environments pose challenges

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised segmentation using foundation model features

Temporal consistency enforcement across video frames

Superpixel-level clustering for terrain boundary identification

🔎 Similar Papers

No similar papers found.