🤖 AI Summary
This study addresses the challenges of high annotation cost and frequent missed detections—particularly for ambiguous or uncertain individuals—in automated detection of belugas and harp seals in very-high-resolution (VHR) satellite imagery. We propose a weakly supervised detection framework that synergistically integrates sparse point annotations with the Segment Anything Model (SAM). To our knowledge, this is the first work to adapt SAM for marine mammal point-annotation scenarios, establishing an automated pipeline from sparse points to high-precision bounding boxes. Furthermore, we introduce, for the first time, an “uncertain beluga” class to improve ecological robustness. Using this annotation strategy to train YOLOv8, we achieve F₁-scores of 72.2% for belugas and 70.3% for harp seals on VHR satellite images—substantially outperforming conventional buffer-based annotation methods, especially in densely populated target regions.
📝 Abstract
Very high-resolution (VHR) satellite imagery has emerged as a powerful tool for monitoring marine animals on a large scale. However, existing deep learning-based whale detection methods usually require manually created, high-quality bounding box annotations, which are labor-intensive to produce. Moreover, existing studies often exclude ``uncertain whales'', individuals that have ambiguous appearances in satellite imagery, limiting the applicability of these models in real-world scenarios. To address these limitations, this study introduces an automated pipeline for detecting beluga whales and harp seals in VHR satellite imagery. The pipeline leverages point annotations and the Segment Anything Model (SAM) to generate precise bounding box annotations, which are used to train YOLOv8 for multiclass detection of certain whales, uncertain whales, and harp seals. Experimental results demonstrated that SAM-generated annotations significantly improved detection performance, achieving higher $ ext{F}_ ext{1}$-scores compared to traditional buffer-based annotations. YOLOv8 trained on SAM-labeled boxes achieved an overall $ ext{F}_ ext{1}$-score of 72.2% for whales overall and 70.3% for harp seals, with superior performance in dense scenes. The proposed approach not only reduces the manual effort required for annotation but also enhances the detection of uncertain whales, offering a more comprehensive solution for marine animal monitoring. This method holds great potential for extending to other species, habitats, and remote sensing platforms, as well as for estimating whale biometrics, thereby advancing ecological monitoring and conservation efforts. The codes for our label and detection pipeline are publicly available at http://github.com/voyagerxvoyagerx/beluga-seeker .