MHAFF: Multi-Head Attention Feature Fusion of CNN and Transformer for Cattle Identification

📅 2025-01-09
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
Convolutional Neural Networks (CNNs) struggle to model long-range dependencies in cattle nasal print recognition, while conventional feature fusion strategies (e.g., addition or concatenation) discard discriminative information and neglect cross-modal interactions between local texture and global structure. Method: We propose a Multi-Head Attention-based Feature Fusion (MHAFF) mechanism—the first to integrate multi-head self-attention into a CNN–Transformer dual-stream architecture—dynamically modeling correlations between CNN-extracted local textures and Transformer-captured global structural patterns, preserving feature fidelity while enhancing interactive representation. Results: On two public cattle nasal print datasets, MHAFF achieves 99.88% and 99.52% identification accuracy, respectively—significantly surpassing existing fusion paradigms and state-of-the-art methods—while demonstrating faster convergence and stronger generalization.

Technology Category

Application Category

📝 Abstract
Convolutional Neural Networks (CNNs) have drawn researchers' attention to identifying cattle using muzzle images. However, CNNs often fail to capture long-range dependencies within the complex patterns of the muzzle. The transformers handle these challenges. This inspired us to fuse the strengths of CNNs and transformers in muzzle-based cattle identification. Addition and concatenation have been the most commonly used techniques for feature fusion. However, addition fails to preserve discriminative information, while concatenation results in an increase in dimensionality. Both methods are simple operations and cannot discover the relationships or interactions between fusing features. This research aims to overcome the issues faced by addition and concatenation. This research introduces a novel approach called Multi-Head Attention Feature Fusion (MHAFF) for the first time in cattle identification. MHAFF captures relations between the different types of fusing features while preserving their originality. The experiments show that MHAFF outperformed addition and concatenation techniques and the existing cattle identification methods in accuracy on two publicly available cattle datasets. MHAFF demonstrates excellent performance and quickly converges to achieve optimum accuracy of 99.88% and 99.52% in two cattle datasets simultaneously.
Problem

Research questions and friction points this paper is trying to address.

Cattle Recognition
CNN-Transformer Integration
Information Fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

MHAFF
Multi-Head Attention
Feature Fusion
🔎 Similar Papers
No similar papers found.
Rabin Dulal
Rabin Dulal
Charles Sturt University
Machine LearningDeep learningComputer visionObject identification
Lihong Zheng
Lihong Zheng
Charles Sturt University
M
Muhammad Ashad Kabir
School of Computing, Mathematics and Engineering, Charles Sturt University, NSW, Australia; Food Agility CRC Ltd, Sydney, 2000, NSW, Australia; Gulbali Institute for Agriculture, Water and Environment, Wagga Wagga, 2650, NSW, Australia