🤖 AI Summary
Precise localization of leader unmanned surface vessels (USVs) remains challenging in dynamic, complex underwater environments during multi-USV collaborative operations.
Method: This paper proposes an end-to-end deep learning localization framework integrating optical, acoustic, and pressure sensing modalities. We design a heterogeneous modality alignment and feature-level fusion neural network architecture to synergistically combine high-resolution visual localization, long-range acoustic ranging, and environmental pressure sensing. Additionally, we introduce an underwater-specific calibration strategy and lightweight inference optimization.
Results: Evaluated on a custom-built underwater test platform, the proposed method significantly outperforms single- and dual-modal baselines, achieving a 42% reduction in mean localization error under dynamic, complex conditions. The approach delivers both high accuracy and strong robustness, demonstrating practical viability for real-world underwater multi-agent coordination.
📝 Abstract
Underwater vehicles have emerged as a critical technology for exploring and monitoring aquatic environments. The deployment of multi-vehicle systems has gained substantial interest due to their capability to perform collaborative tasks with improved efficiency. However, achieving precise localization of a leader underwater vehicle within a multi-vehicle configuration remains a significant challenge, particularly in dynamic and complex underwater conditions. To address this issue, this paper presents a novel tri-modal sensor fusion neural network approach that integrates optical, acoustic, and pressure sensors to localize the leader vehicle. The proposed method leverages the unique strengths of each sensor modality to improve localization accuracy and robustness. Specifically, optical sensors provide high-resolution imaging for precise relative positioning, acoustic sensors enable long-range detection and ranging, and pressure sensors offer environmental context awareness. The fusion of these sensor modalities is implemented using a deep learning architecture designed to extract and combine complementary features from raw sensor data. The effectiveness of the proposed method is validated through a custom-designed testing platform. Extensive data collection and experimental evaluations demonstrate that the tri-modal approach significantly improves the accuracy and robustness of leader localization, outperforming both single-modal and dual-modal methods.