🤖 AI Summary
This work addresses the significant challenge of automatically detecting and segmenting rip currents, whose visual appearance varies substantially across beaches, camera viewpoints, and sea conditions. To this end, the authors construct a comprehensive rip current dataset based on the RipVIS benchmark, encompassing over ten countries, four distinct camera angles, and diverse sea states. For the first time, detection and segmentation performance are jointly evaluated within a unified framework, employing a composite scoring metric to holistically assess model behavior across multiple thresholds. By integrating pretrained vision models, strong data augmentation, and tailored post-processing strategies, the approach achieves markedly improved generalization. The associated competition attracted 159 participants, yielding nine valid submissions; results indicate that while general-purpose vision models show promise, specialized methods are still required to effectively model the unique structural characteristics of rip currents.
📝 Abstract
This report presents the NTIRE 2026 Rip Current Detection and Segmentation (RipDetSeg) Challenge, which targets automatic rip current understanding in images. Rip currents are hazardous nearshore flows that cause many beach-related fatalities worldwide, yet remain difficult to identify because their visual appearance varies substantially across beaches, viewpoints, and sea states. To advance research on this safety-critical problem, the challenge builds on the RipVIS benchmark, evaluating both detection and segmentation. The dataset is diverse, sourced from more than $10$ countries, with $4$ camera orientations and diverse beach and sea conditions. This report describes the dataset, challenge protocol, evaluation methodology, final results, and summarizes the main insights from the submitted methods. The challenge attracted $159$ registered participants and produced $9$ valid test submissions across the two tasks. Final rankings are based on a composite score that combines $F_1[50]$, $F_2[50]$, $F_1[40\!:\!95]$, and $F_2[40\!:\!95]$. Most participant solutions relied on pretrained models, combined with strong augmentation and post-processing design. These results suggest that rip current understanding benefits strongly from the robust general-purpose vision models' progress, while leaving ample room for future methods tailored to their unique visual structure.