A Sonar-Visual Dataset for Cross-Modal Underwater Robot Perception

πŸ“… 2026-05-31
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

228K/year
πŸ€– AI Summary
This work addresses the scarcity of paired sonar and visual data, a key limitation in underwater cross-modal perception. To overcome this, we introduce SOVIS, a large-scale, synchronized sonar–optical dataset comprising over 76,000 high-quality frame pairs collected across six locations and 17 dives, accompanied by an end-to-end data curation pipeline and an interactive annotation tool. Leveraging this dataset, we pioneer novel tasks such as dense sonar prediction from monocular images and demonstrate significant performance gains: under few-shot annotation settings, our approach achieves a sevenfold improvement in fish detection mAP@0.10 compared to a monocular camera baseline. This advancement substantially advances the state of the art in underwater cross-modal perception.
πŸ“ Abstract
Underwater robots typically use both cameras and sonar for perception to leverage the rich semantic details of vision and the robust range measurements of acoustics. However, learning to map between these modalities via cross-modal prediction remains underexplored due to limited sonar-visual paired datasets. We present SOVIS, a sonar-visual dataset for cross-modal underwater perception. SOVIS comprises over 76,000 paired frames collected across 17 dives at six sites in the Trondheimfjord, supported by an end-to-end pipeline that cleans and synchronizes the cross-modal sensor data. We also introduce an interactive annotation tool designed to accelerate the labeling process for this paired data. Finally, we demonstrate a proof-of-concept cross-modal fish detection task using a small subset of labeled data, achieving a 7x improvement in mAP@0.10 over a monocular camera baseline. SOVIS serves as the first step toward advancing cross-modal underwater perception research, enabling research directions such as dense sonar prediction from monocular images.
Problem

Research questions and friction points this paper is trying to address.

cross-modal perception
underwater robotics
sonar-visual dataset
sensor fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-modal perception
sonar-visual dataset
underwater robotics
sensor fusion
interactive annotation