Counting Fish with Temporal Representations of Sonar Video

📅 2025-02-07

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Accurately estimating salmon migration counts in remote, resource-constrained field environments—characterized by limited computational power and unreliable network connectivity—remains a significant challenge. Method: This paper proposes a lightweight, end-to-end counting method based on echograms: time-series compressed representations derived from sonar video sequences. It introduces two key innovations: (i) the first systematic compression of hundreds of sonar video frames into a single echogram for compact spatiotemporal representation, and (ii) a domain-adaptive image enhancement strategy and weakly supervised training paradigm tailored to underwater sonar data. The model employs a ResNet-18 backbone. Results: Evaluated on real-world sonar data from the Kenai River, Alaska, the method achieves a 23% mean absolute percentage error (MAPE), substantially outperforming conventional detection-and-tracking approaches. It operates without high-performance hardware or real-time network access, demonstrating robustness and deployability under low-resource conditions—providing a practical, scalable solution for fisheries conservation and management.

Technology Category

Application Category

📝 Abstract

Accurate estimates of salmon escapement - the number of fish migrating upstream to spawn - are key data for conservation and fishery management. Existing methods for salmon counting using high-resolution imaging sonar hardware are non-invasive and compatible with computer vision processing. Prior work in this area has utilized object detection and tracking based methods for automated salmon counting. However, these techniques remain inaccessible to many sonar deployment sites due to limited compute and connectivity in the field. We propose an alternative lightweight computer vision method for fish counting based on analyzing echograms - temporal representations that compress several hundred frames of imaging sonar video into a single image. We predict upstream and downstream counts within 200-frame time windows directly from echograms using a ResNet-18 model, and propose a set of domain-specific image augmentations and a weakly-supervised training protocol to further improve results. We achieve a count error of 23% on representative data from the Kenai River in Alaska, demonstrating the feasibility of our approach.

Problem

Research questions and friction points this paper is trying to address.

Lightweight fish counting method

Analyzing echograms for salmon counts

Improving accuracy with image augmentations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight echogram analysis

ResNet-18 model usage

Domain-specific image augmentations

🔎 Similar Papers

FMRFT: Fusion Mamba and DETR for Query Time Sequence Intersection Fish Tracking