🤖 AI Summary
Audio classifiers suffer from domain shift under acoustic environmental variations, yet existing test-time adaptation (TTA) studies predominantly evaluate performance under static or mismatched noise conditions, failing to model the diversity of real-world degradations. To address this, we propose DHAuDS—a novel, dynamic heterogeneous audio degradation benchmark specifically designed for audio TTA evaluation. Built upon four core datasets including UrbanSound8K-C, DHAuDS synthesizes degraded samples via dynamically modulated intensity control and multi-type noise superposition. It establishes four standardized benchmarks, introduces 14 differentiated evaluation metrics, and defines dynamic mixed-domain noise configurations. We conduct 124 reproducible experiments. As the first systematic framework for audio TTA, DHAuDS enables fair, cross-domain, and robust assessment of TTA methods under diverse, realistic audio degradations—substantially enhancing the comprehensiveness and credibility of audio model generalization evaluation.
📝 Abstract
Audio classifiers frequently face domain shift, when models trained on one dataset lose accuracy on data recorded in acoustically different conditions. Previous Test-Time Adaptation (TTA) research in speech and sound analysis often evaluates models under fixed or mismatched noise settings, that fail to mimic real-world variability. To overcome these limitations, this paper presents DHAuDS (Dynamic and Heterogeneous Audio Domain Shift), a benchmark designed to assess TTA approaches under more realistic and diverse acoustic shifts. DHAuDS comprises four standardized benchmarks: UrbanSound8K-C, SpeechCommandsV2-C, VocalSound-C, and ReefSet-C, each constructed with dynamic corruption severity levels and heterogeneous noise types to simulate authentic audio degradation scenarios. The framework defines 14 evaluation criteria for each benchmark (8 for UrbanSound8K-C), resulting in 50 unrepeated criteria (124 experiments) that collectively enable fair, reproducible, and cross-domain comparison of TTA algorithms. Through the inclusion of dynamic and mixed-domain noise settings, DHAuDS offers a consistent and publicly reproducible testbed to support ongoing studies in robust and adaptive audio modeling.