A multi-head deep fusion model for recognition of cattle foraging events using sound and movement signals

📅 2025-05-15

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the low accuracy and poor robustness in automatic identification of grazing behaviors in cattle, this paper proposes a sound-inertial multimodal fusion approach. We design a feature-level deep fusion architecture to avoid information loss inherent in data-level or decision-level fusion, introduce an end-to-end multimodal feature disentanglement mechanism to eliminate reliance on handcrafted features, and develop a lightweight CNN-RNN-Dense hybrid network optimized via model pruning and post-training quantization. Evaluated on real-world pasture data, our method achieves an F1-score of 0.802—surpassing the state-of-the-art by 14 percentage points (Δ ≥ 0.14). This significant improvement substantially enhances robustness in detecting feeding events, thereby providing a reliable technical foundation for intelligent pasture management and early health anomaly detection.

Technology Category

Application Category

📝 Abstract

Monitoring feeding behaviour is a relevant task for efficient herd management and the effective use of available resources in grazing cattle. The ability to automatically recognise animals' feeding activities through the identification of specific jaw movements allows for the improvement of diet formulation, as well as early detection of metabolic problems and symptoms of animal discomfort, among other benefits. The use of sensors to obtain signals for such monitoring has become popular in the last two decades. The most frequently employed sensors include accelerometers, microphones, and cameras, each with its own set of advantages and drawbacks. An unexplored aspect is the simultaneous use of multiple sensors with the aim of combining signals in order to enhance the precision of the estimations. In this direction, this work introduces a deep neural network based on the fusion of acoustic and inertial signals, composed of convolutional, recurrent, and dense layers. The main advantage of this model is the combination of signals through the automatic extraction of features independently from each of them. The model has emerged from an exploration and comparison of different neural network architectures proposed in this work, which carry out information fusion at different levels. Feature-level fusion has outperformed data and decision-level fusion by at least a 0.14 based on the F1-score metric. Moreover, a comparison with state-of-the-art machine learning methods is presented, including traditional and deep learning approaches. The proposed model yielded an F1-score value of 0.802, representing a 14% increase compared to previous methods. Finally, results from an ablation study and post-training quantization evaluation are also reported.

Problem

Research questions and friction points this paper is trying to address.

Automatically recognize cattle feeding activities using sound and movement signals

Improve feeding behavior monitoring via multi-sensor fusion for better precision

Develop a deep neural network to combine acoustic and inertial data effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-head deep fusion model for cattle monitoring

Combines acoustic and inertial signals automatically

Feature-level fusion outperforms data and decision-level

🔎 Similar Papers

No similar papers found.

Authors to Follow