Semantically-Aware Diver Activity Recognition Framework for Effective Underwater Multi-Human-Robot Collaboration

📅 2026-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of enabling autonomous underwater vehicles to interpret diver behavior for effective collaboration and safety assurance in high-risk underwater environments. To this end, the authors propose DAR-Net, a novel framework that integrates Transformer-based temporal modeling with pixel-level semantic supervision, leveraging a multi-task loss to jointly achieve global activity recognition and local human–robot interaction alignment. The study introduces a semantics-guided learning paradigm and presents the first Underwater Diver Activity (UDA) dataset, comprising over 2,600 images with pixel-level mask annotations, which mitigates challenges posed by poor visibility and data scarcity. Experimental results demonstrate that the proposed method significantly outperforms state-of-the-art approaches across six diver activity categories, establishing a foundation for intelligent underwater collaborative systems.

📝 Abstract

Effective multi-human-robot collaboration is essential for expanding human-led operations in the challenging and high-risk underwater environment. For autonomous underwater vehicles (AUVs) to become true teammates, they must be able to comprehend their surroundings and recognize a diver's activities to offer assistance and ensure safety. Towards this goal, we introduce DAR-Net, a novel transformer-based framework that analyzes complex underwater scenes to classify diver activities. Our contribution lies in a semantically guided learning formulation that couples transformer-based temporal reasoning with pixel-level scene supervision. This multi-loss training strategy explicitly aligns global activity recognition with local human-robot interaction semantics, which is particularly critical in low-visibility underwater conditions. To address the significant challenge of data scarcity in this domain, we present the first-ever Underwater Diver Activity (UDA) dataset, a foundational resource containing over 2,600 annotated images with pixel-level masks. Through rigorous experimental evaluations in a controlled environment, we demonstrate that DAR-Net achieves promising accuracy in recognizing six distinct diver activities, outperforming state-of-the-art models. While this dataset provides a crucial baseline, our work serves as a pioneering step, laying the groundwork for future research and facilitating the development of more intelligent, collaborative underwater robotic systems.

Problem

Research questions and friction points this paper is trying to address.

underwater human-robot collaboration

diver activity recognition

semantic scene understanding

data scarcity

low-visibility environment

Innovation

Methods, ideas, or system contributions that make the work stand out.

transformer-based activity recognition

semantically-guided learning

pixel-level supervision