🤖 AI Summary
To address the low accuracy and poor robustness in automatic identification of grazing behaviors in cattle, this paper proposes a sound-inertial multimodal fusion approach. We design a feature-level deep fusion architecture to avoid information loss inherent in data-level or decision-level fusion, introduce an end-to-end multimodal feature disentanglement mechanism to eliminate reliance on handcrafted features, and develop a lightweight CNN-RNN-Dense hybrid network optimized via model pruning and post-training quantization. Evaluated on real-world pasture data, our method achieves an F1-score of 0.802—surpassing the state-of-the-art by 14 percentage points (Δ ≥ 0.14). This significant improvement substantially enhances robustness in detecting feeding events, thereby providing a reliable technical foundation for intelligent pasture management and early health anomaly detection.
📝 Abstract
Monitoring feeding behaviour is a relevant task for efficient herd management and the effective use of available resources in grazing cattle. The ability to automatically recognise animals' feeding activities through the identification of specific jaw movements allows for the improvement of diet formulation, as well as early detection of metabolic problems and symptoms of animal discomfort, among other benefits. The use of sensors to obtain signals for such monitoring has become popular in the last two decades. The most frequently employed sensors include accelerometers, microphones, and cameras, each with its own set of advantages and drawbacks. An unexplored aspect is the simultaneous use of multiple sensors with the aim of combining signals in order to enhance the precision of the estimations. In this direction, this work introduces a deep neural network based on the fusion of acoustic and inertial signals, composed of convolutional, recurrent, and dense layers. The main advantage of this model is the combination of signals through the automatic extraction of features independently from each of them. The model has emerged from an exploration and comparison of different neural network architectures proposed in this work, which carry out information fusion at different levels. Feature-level fusion has outperformed data and decision-level fusion by at least a 0.14 based on the F1-score metric. Moreover, a comparison with state-of-the-art machine learning methods is presented, including traditional and deep learning approaches. The proposed model yielded an F1-score value of 0.802, representing a 14% increase compared to previous methods. Finally, results from an ablation study and post-training quantization evaluation are also reported.