π€ AI Summary
To address the scarcity of bioacoustic data, high annotation costs, and insufficient cross-taxa coverage, this study introduces iNatSoundβthe first large-scale, multi-taxon (birds, mammals, insects, etc.), weakly supervised global bioacoustic dataset, comprising 230,000 audio recordings from over 5,500 species, sourced from iNaturalist citizen science observations. Innovatively integrating field-collected weakly labeled audio, iNatSound supports both single-species classification and multi-label learning. A rigorous cross-dataset evaluation protocol is designed to validate its utility as a pretraining resource for downstream strongly labeled tasks. Leveraging contrastive learning with multiple backbone architectures (e.g., ResNet, EfficientNet), models pretrained on iNatSound achieve significant performance gains across multiple acoustic recognition benchmarks. The dataset is publicly released, establishing a foundational resource for ecological AI and participatory biodiversity monitoring.
π Abstract
We present the iNaturalist Sounds Dataset (iNatSounds), a collection of 230,000 audio files capturing sounds from over 5,500 species, contributed by more than 27,000 recordists worldwide. The dataset encompasses sounds from birds, mammals, insects, reptiles, and amphibians, with audio and species labels derived from observations submitted to iNaturalist, a global citizen science platform. Each recording in the dataset varies in length and includes a single species annotation. We benchmark multiple backbone architectures, comparing multiclass classification objectives with multilabel objectives. Despite weak labeling, we demonstrate that iNatSounds serves as a useful pretraining resource by benchmarking it on strongly labeled downstream evaluation datasets. The dataset is available as a single, freely accessible archive, promoting accessibility and research in this important domain. We envision models trained on this data powering next-generation public engagement applications, and assisting biologists, ecologists, and land use managers in processing large audio collections, thereby contributing to the understanding of species compositions in diverse soundscapes.