Can Masked Autoencoders Also Listen to Birds?

πŸ“… 2025-04-17
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
General-purpose Masked Autoencoders (MAEs) struggle to capture domain-specific acoustic features critical for fine-grained avian vocalization classification in bioacoustic monitoring. Method: We propose Bird-MAE, the first MAE architecture specifically designed for avian acoustics. Leveraging the large-scale BirdSet dataset, we introduce an acoustic-aware MAE framework with optimized masking strategies, domain-adaptive pretraining, and novel frozen-representation utilization. We further devise prototypical probingβ€”a parameter-efficient, highly discriminative transfer method for frozen representations. Results: Bird-MAE achieves state-of-the-art performance across all BirdSet downstream tasks. In multi-label classification, it significantly outperforms generic Audio-MAE baselines in mean Average Precision (mAP). Prototypical probing yields up to 37% higher mAP than linear probing and approaches full-parameter fine-tuning performance, with an average gap of only β‰ˆ3%.

Technology Category

Application Category

πŸ“ Abstract
Masked Autoencoders (MAEs) pretrained on AudioSet fail to capture the fine-grained acoustic characteristics of specialized domains such as bioacoustic monitoring. Bird sound classification is critical for assessing environmental health, yet general-purpose models inadequately address its unique acoustic challenges. To address this, we introduce Bird-MAE, a domain-specialized MAE pretrained on the large-scale BirdSet dataset. We explore adjustments to pretraining, fine-tuning and utilizing frozen representations. Bird-MAE achieves state-of-the-art results across all BirdSet downstream tasks, substantially improving multi-label classification performance compared to the general-purpose Audio-MAE baseline. Additionally, we propose prototypical probing, a parameter-efficient method for leveraging MAEs' frozen representations. Bird-MAE's prototypical probes outperform linear probing by up to 37% in MAP and narrow the gap to fine-tuning to approximately 3% on average on BirdSet.
Problem

Research questions and friction points this paper is trying to address.

Improving bird sound classification for environmental monitoring
Adapting Masked Autoencoders for bioacoustic domain challenges
Enhancing multi-label classification performance with Bird-MAE
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-specialized Bird-MAE for bioacoustic monitoring
Pretrained on large-scale BirdSet dataset
Prototypical probing improves frozen representation usage
πŸ”Ž Similar Papers
No similar papers found.