Towards Visual Re-Identification of Fish using Fine-Grained Classification for Electronic Monitoring in Fisheries

📅 2025-12-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of manual re-identification of visually similar fish individuals in large-scale electronic monitoring videos for fisheries, this paper proposes an automated fine-grained fish Re-ID framework. We introduce a novel hard triplet mining strategy and a dataset-adaptive normalization image transformation pipeline, revealing through systematic analysis that viewpoint variation poses greater difficulty than occlusion. Our method employs the Swin-T Vision Transformer as the backbone for deep metric learning, integrating hard triplet loss with a customized data augmentation pipeline. Evaluated on the AutoFish dataset, it achieves 90.43% Rank-1 accuracy and 41.65% mAP@k—substantially outperforming ResNet-50—and demonstrates the superiority of Transformer architectures for fine-grained fish re-identification. This work provides a scalable, robust technical foundation for intelligent marine resource monitoring and management.

Technology Category

Application Category

📝 Abstract
Accurate fisheries data are crucial for effective and sustainable marine resource management. With the recent adoption of Electronic Monitoring (EM) systems, more video data is now being collected than can be feasibly reviewed manually. This paper addresses this challenge by developing an optimized deep learning pipeline for automated fish re-identification (Re-ID) using the novel AutoFish dataset, which simulates EM systems with conveyor belts with six similarly looking fish species. We demonstrate that key Re-ID metrics (R1 and mAP@k) are substantially improved by using hard triplet mining in conjunction with a custom image transformation pipeline that includes dataset-specific normalization. By employing these strategies, we demonstrate that the Vision Transformer-based Swin-T architecture consistently outperforms the Convolutional Neural Network-based ResNet-50, achieving peak performance of 41.65% mAP@k and 90.43% Rank-1 accuracy. An in-depth analysis reveals that the primary challenge is distinguishing visually similar individuals of the same species (Intra-species errors), where viewpoint inconsistency proves significantly more detrimental than partial occlusion. The source code and documentation are available at: https://github.com/msamdk/Fish_Re_Identification.git
Problem

Research questions and friction points this paper is trying to address.

Automated fish re-identification from electronic monitoring video data
Distinguishing visually similar fish species for sustainable fisheries management
Improving re-identification accuracy using deep learning and dataset-specific transformations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hard triplet mining for improved re-identification metrics
Custom image transformation with dataset-specific normalization
Swin-T Vision Transformer outperforms ResNet-50 CNN
S
Samitha Nuwan Thilakarathna
DTU Aqua - National Institute of Aquatic Resources, Technical University of Denmark
E
Ercan Avsar
DTU Aqua - National Institute of Aquatic Resources, Technical University of Denmark
M
Martine Mathias Nielsen
DTU Aqua - National Institute of Aquatic Resources, Technical University of Denmark
Malte Pedersen
Malte Pedersen
Postdoc, Aalborg University/Pioneer Centre for AI
computer visionmarine visionmachine learning