BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics

📅 2024-03-15

📈 Citations: 2

✨ Influential: 0

career value

180K/year

🤖 AI Summary

To address the lack of large-scale, high-quality benchmark datasets for avian acoustic recognition, this paper introduces BirdSet—the first open-source avian audio benchmark covering nearly 10,000 species, with over 6,800 hours of training audio and more than 400 hours of multi-scenario evaluation data. Its key contributions are: (1) a class scale 18× larger than AudioSet and 17% greater total duration; (2) the first unified annotation schema across eight heterogeneous, multi-source evaluation sets, enabling multi-label classification, covariate shift analysis, and self-supervised learning; and (3) Hugging Face–hosted infrastructure with standardized preprocessing, PyTorch-compatible code, and implementations of multi-label loss functions. Comprehensive evaluation across six state-of-the-art models and three training paradigms demonstrates that BirdSet significantly improves robustness in recognizing rare bird species and advances reproducible benchmarks in bioacoustics.

Technology Category

Application Category

📝 Abstract

Deep learning (DL) has greatly advanced audio classification, yet the field is limited by the scarcity of large-scale benchmark datasets that have propelled progress in other domains. While AudioSet is a pivotal step to bridge this gap as a universal-domain dataset, its restricted accessibility and limited range of evaluation use cases challenge its role as the sole resource. Therefore, we introduce exttt{BirdSet}, a large-scale benchmark dataset for audio classification focusing on avian bioacoustics. exttt{BirdSet} surpasses AudioSet with over 6,800 recording hours~($uparrow!17%$) from nearly 10,000 classes~($uparrow!18 imes$) for training and more than 400 hours~($uparrow!7 imes$) across eight strongly labeled evaluation datasets. It serves as a versatile resource for use cases such as multi-label classification, covariate shift or self-supervised learning. We benchmark six well-known DL models in multi-label classification across three distinct training scenarios and outline further evaluation use cases in audio classification. We host our dataset on Hugging Face for easy accessibility and offer an extensive codebase to reproduce our results.

Problem

Research questions and friction points this paper is trying to address.

Avian Sound Database

Deep Learning Models

Bird Call Recognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

BirdSet

Sound Recognition

Deep Learning

🔎 Similar Papers

No similar papers found.