๐ค AI Summary
Existing USV (ultrasonic vocalization) detection methods predominantly rely on machine learning, suffering from poor generalizability and unstable cross-dataset performance. To address this, we propose ContourUSVโthe first machine-learning-free, time-frequency contour-driven framework for robust USV detection. It comprises spectrogram denoising, contour extraction, rule-based morphological post-processing, and comprehensive evaluation using multiple metrics (precision, recall, F1-score, and specificity). Concurrently, we release a new publicly available dataset to enhance cross-domain reliability. Evaluated on two benchmark datasets against three state-of-the-art (SOTA) methods, ContourUSV achieves average improvements of 1.51ร in precision, 1.80ร in F1-score, and 1.49ร in specificity, while accelerating inference by 117ร. These results demonstrate substantial gains in both detection reliability and computational efficiency.
๐ Abstract
Analyzing ultrasonic vocalizations (USVs) is crucial for understanding rodents' affective states and social behaviors, but the manual analysis is time-consuming and prone to errors. Automated USV detection systems have been developed to address these challenges. Yet, these systems often rely on machine learning and fail to generalize effectively to new datasets. To tackle these shortcomings, we introduce ContourUSV, an efficient automated system for detecting USVs from audio recordings. Our pipeline includes spectrogram generation, cleaning, pre-processing, contour detection, post-processing, and evaluation against manual annotations. To ensure robustness and reliability, we compared ContourUSV with three state-of-the-art systems using an existing open-access USV dataset (USVSEG) and a second dataset we are releasing publicly along with this paper. On average, across the two datasets, ContourUSV outperformed the other three systems with a 1.51x improvement in precision, 1.17x in recall, 1.80x in F1 score, and 1.49x in specificity while achieving an average speedup of 117.07x.