π€ AI Summary
Underwater passive acoustic monitoring (UPAM) suffers from high environmental noise variability, complex propagation effects, and mixed sound sources, leading to poor model stability and limited cross-scenario generalizability. To address these challenges, we propose GetNetUPAM, an ecology-informed evaluation framework that introduces, for the first time in marine bioacoustics, siteβyear nested cross-validation and a quantitative environmental diversity metric. Additionally, we design ARPA-N, an adaptive-resolution pooling and spatial-attention network tailored for irregular spectrogram modeling. Experiments demonstrate that ARPA-N achieves a 14.4% average accuracy gain over DenseNet while reducing metric variance by an order of magnitude. It delivers robust detection performance across sites and years. Collectively, GetNetUPAM and ARPA-N advance scalable, highly robust bioacoustic monitoring methodologies for real-world marine ecosystems.
π Abstract
Underwater Passive Acoustic Monitoring (UPAM) provides rich spatiotemporal data for long-term ecological analysis, but intrinsic noise and complex signal dependencies hinder model stability and generalization. Multilayered windowing has improved target sound localization, yet variability from shifting ambient noise, diverse propagation effects, and mixed biological and anthropogenic sources demands robust architectures and rigorous evaluation. We introduce GetNetUPAM, a hierarchical nested cross-validation framework designed to quantify model stability under ecologically realistic variability. Data are partitioned into distinct site-year segments, preserving recording heterogeneity and ensuring each validation fold reflects a unique environmental subset, reducing overfitting to localized noise and sensor artifacts. Site-year blocking enforces evaluation against genuine environmental diversity, while standard cross-validation on random subsets measures generalization across UPAM's full signal distribution, a dimension absent from current benchmarks. Using GetNetUPAM as the evaluation backbone, we propose the Adaptive Resolution Pooling and Attention Network (ARPA-N), a neural architecture for irregular spectrogram dimensions. Adaptive pooling with spatial attention extends the receptive field, capturing global context without excessive parameters. Under GetNetUPAM, ARPA-N achieves a 14.4% gain in average precision over DenseNet baselines and a log2-scale order-of-magnitude drop in variability across all metrics, enabling consistent detection across site-year folds and advancing scalable, accurate bioacoustic monitoring.