🤖 AI Summary
Existing neural audio codecs typically employ domain-specific codebooks for speech, music, or environmental sounds, hindering unified, efficient representation of general audio at ultra-low bitrates. Method: We propose the first single-codebook neural codec for general audio—including speech, music, and environmental sounds—operating at ~700 bps while reconstructing 16 kHz audio. To reconcile divergent modeling requirements across domains, we introduce a Matryoshka codebook architecture with nested domain-specific partitions, trained via single-stage teacher distillation. We employ a Conformer encoder with STFT-based features and hierarchical knowledge distillation to jointly model heterogeneous audio types within one quantized latent space. Contribution/Results: Experiments demonstrate that our method achieves speech and general audio reconstruction quality on par with state-of-the-art domain-specific single-layer quantizers. This work is the first to empirically validate the feasibility of high-fidelity, single-codebook vector quantization for general audio under ultra-low-bitrate constraints.
📝 Abstract
We propose AUV, a unified neural audio codec with a single codebook, which enables a favourable reconstruction of speech and further extends to general audio, including vocal, music, and sound. AUV is capable of tackling any 16 kHz mixed-domain audio segment at bit rates around 700 bps. To accomplish this, we guide the matryoshka codebook with nested domain-specific partitions, assigned with corresponding teacher models to perform distillation, all in a single-stage training. A conformer-style encoder-decoder architecture with STFT features as audio representation is employed, yielding better audio quality. Comprehensive evaluations demonstrate that AUV exhibits comparable audio reconstruction ability to state-of-the-art domain-specific single-layer quantizer codecs, showcasing the potential of audio universal vector quantization with a single codebook. The pre-trained model and demo samples are available at https://swivid.github.io/AUV/.