🤖 AI Summary
To address the high latency and computational overhead of frame-based camera road segmentation in autonomous driving, this paper proposes an unsupervised semantic segmentation method tailored for event cameras. To overcome the scarcity of event data annotations and reliance on cross-modal pretraining, we introduce the first self-supervised contrastive learning framework specifically designed for the event domain, augmented with a probabilistic attention mechanism to model spatiotemporal uncertainty inherent in event streams. Our method requires neither RGB images nor pretrained weights, achieving efficient road region segmentation solely from sparse asynchronous events. Evaluated on DSEC-Semantic and DDD17 benchmarks, it attains state-of-the-art performance using minimal supervision (<0.1% pixel-level labels), significantly improving both segmentation accuracy and real-time inference capability—fully leveraging the event camera’s advantages of ultra-low power consumption and microsecond-level latency.
📝 Abstract
Road segmentation is pivotal for autonomous vehicles, yet achieving low latency and low compute solutions using frame based cameras remains a challenge. Event cameras offer a promising alternative. To leverage their low power sensing, we introduce EventSSEG, a method for road segmentation that uses event only computing and a probabilistic attention mechanism. Event only computing poses a challenge in transferring pretrained weights from the conventional camera domain, requiring abundant labeled data, which is scarce. To overcome this, EventSSEG employs event-based self supervised learning, eliminating the need for extensive labeled data. Experiments on DSEC-Semantic and DDD17 show that EventSSEG achieves state of the art performance with minimal labeled events. This approach maximizes event cameras capabilities and addresses the lack of labeled events.