Semantically-Aware Diver Activity Recognition Framework for Effective Underwater Multi-Human-Robot Collaboration

πŸ“… 2026-06-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of enabling autonomous underwater vehicles to interpret diver behavior for effective collaboration and safety assurance in high-risk underwater environments. To this end, the authors propose DAR-Net, a novel framework that integrates Transformer-based temporal modeling with pixel-level semantic supervision, leveraging a multi-task loss to jointly achieve global activity recognition and local human–robot interaction alignment. The study introduces a semantics-guided learning paradigm and presents the first Underwater Diver Activity (UDA) dataset, comprising over 2,600 images with pixel-level mask annotations, which mitigates challenges posed by poor visibility and data scarcity. Experimental results demonstrate that the proposed method significantly outperforms state-of-the-art approaches across six diver activity categories, establishing a foundation for intelligent underwater collaborative systems.
πŸ“ Abstract
Effective multi-human-robot collaboration is essential for expanding human-led operations in the challenging and high-risk underwater environment. For autonomous underwater vehicles (AUVs) to become true teammates, they must be able to comprehend their surroundings and recognize a diver's activities to offer assistance and ensure safety. Towards this goal, we introduce DAR-Net, a novel transformer-based framework that analyzes complex underwater scenes to classify diver activities. Our contribution lies in a semantically guided learning formulation that couples transformer-based temporal reasoning with pixel-level scene supervision. This multi-loss training strategy explicitly aligns global activity recognition with local human-robot interaction semantics, which is particularly critical in low-visibility underwater conditions. To address the significant challenge of data scarcity in this domain, we present the first-ever Underwater Diver Activity (UDA) dataset, a foundational resource containing over 2,600 annotated images with pixel-level masks. Through rigorous experimental evaluations in a controlled environment, we demonstrate that DAR-Net achieves promising accuracy in recognizing six distinct diver activities, outperforming state-of-the-art models. While this dataset provides a crucial baseline, our work serves as a pioneering step, laying the groundwork for future research and facilitating the development of more intelligent, collaborative underwater robotic systems.
Problem

Research questions and friction points this paper is trying to address.

underwater human-robot collaboration
diver activity recognition
semantic scene understanding
data scarcity
low-visibility environment
Innovation

Methods, ideas, or system contributions that make the work stand out.

transformer-based activity recognition
semantically-guided learning
pixel-level supervision
underwater human-robot collaboration
diver activity dataset
πŸ”Ž Similar Papers
No similar papers found.