UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the perceptual quality bottleneck imposed by bandwidth-limited (8 kHz wideband, WB) speech codecs, this paper proposes a lightweight, modular GAN-based architecture for blind and guided super-wideband (16 kHz SWB) bandwidth extension (BWE). Methodologically, it introduces the first general-purpose GAN framework operating in the subband domain, integrating quantized side-information encoding with conditional generation—enabling cross-codec (traditional and neural) and cross-bitrate generalization without retraining. Key contributions include: (i) an end-to-end plug-and-play BWE solution, where the guided mode incurs <1 kbps additional overhead; (ii) significant subjective improvements in naturalness and clarity; and (iii) strong compatibility and robustness validated across diverse WB codecs, including state-of-the-art neural codecs.

Technology Category

Application Category

📝 Abstract

In practical application of speech codecs, a multitude of factors such as the quality of the radio connection, limiting hardware or required user experience necessitate trade-offs between achievable perceptual quality, engendered bitrate and computational complexity. Most conventional and neural speech codecs operate on wideband (WB) speech signals to achieve this compromise. To further enhance the perceptual quality of coded speech, bandwidth extension (BWE) of the transmitted speech is an attractive and popular technique in conventional speech coding. In contrast, neural speech codecs are typically trained end-to-end to a specific set of requirements and are often not easily adaptable. In particular, they are typically trained to operate at a single fixed sampling rate. With the Universal Bandwidth Extension Generative Adversarial Network (UBGAN), we propose a modular and lightweight GAN-based solution that increases the operational flexibility of a wide range of conventional and neural codecs. Our model operates in the subband domain and extends the bandwidth of WB signals from 8 kHz to 16 kHz, resulting in super-wideband (SWB) signals. We further introduce two variants, guided-UBGAN and blind-UBGAN, where the guided version transmits quantized learned representation as a side information at a very low bitrate additional to the bitrate of the codec, while blind-BWE operates without such side-information. Our subjective assessments demonstrate the advantage of UBGAN applied to WB codecs and highlight the generalization capacity of our proposed method across multiple codecs and bitrates.

Problem

Research questions and friction points this paper is trying to address.

Enhancing perceptual quality of coded speech via bandwidth extension

Increasing operational flexibility of conventional and neural codecs

Extending bandwidth from 8 kHz to 16 kHz for SWB signals

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular GAN-based bandwidth extension solution

Extends 8 kHz WB to 16 kHz SWB signals

Offers guided and blind BWE variants

🔎 Similar Papers

No similar papers found.

Authors to Follow