🤖 AI Summary
Traditional audio front-ends—whether hand-crafted feature extractors or fixed-architecture learnable front-ends—struggle to dynamically adapt to diverse and time-varying acoustic environments, limiting their robustness. This work first systematically establishes the necessity of environment-adaptive audio front-ends. We propose Ada-FE, a differentiable spectral front-end based on neural adaptive feedback control, which enables online acoustic adaptation by real-time modulation of filter Q-factors during spectrogram decomposition. Evaluated across three core tasks—automatic speech recognition, sound event detection, and music analysis—Ada-FE consistently outperforms state-of-the-art learnable front-ends under various downstream neural backbones. It exhibits high training stability and strong cross-domain generalization. Our approach introduces a novel paradigm for robust audio representation learning grounded in adaptive signal processing principles.
📝 Abstract
Hand-crafted features, such as Mel-filterbanks, have traditionally been the choice for many audio processing applications. Recently, there has been a growing interest in learnable front-ends that extract representations directly from the raw audio waveform. extcolor{black}{However, both hand-crafted filterbanks and current learnable front-ends lead to fixed computation graphs at inference time, failing to dynamically adapt to varying acoustic environments, a key feature of human auditory systems.} To this end, we explore the question of whether audio front-ends should be adaptive by comparing the Ada-FE front-end (a recently developed adaptive front-end that employs a neural adaptive feedback controller to dynamically adjust the Q-factors of its spectral decomposition filters) to established learnable front-ends. Specifically, we systematically investigate learnable front-ends and Ada-FE across two commonly used back-end backbones and a wide range of audio benchmarks including speech, sound event, and music. The comprehensive results show that our Ada-FE outperforms advanced learnable front-ends, and more importantly, it exhibits impressive stability or robustness on test samples over various training epochs.