High-Fidelity Generative Audio Compression at 0.275kbps

📅 2026-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a novel generative audio compression (GAC) paradigm that shifts the objective from waveform fidelity to task-oriented effectiveness, addressing the severe distortion inherent in conventional and existing neural audio codecs at ultra-low bitrates. By synergizing semantic encoding at the transmitter with scalable generative synthesis at the receiver, GAC achieves high-fidelity audio reconstruction under extreme bandwidth constraints. The approach uniquely integrates generative modeling with information-theoretic principles, embodying the “more computation, less bandwidth” philosophy to transcend the limitations of traditional waveform reconstruction frameworks. Built upon the AI Flow architecture and powered by a 1.8-billion-parameter generative model, the system delivers high-fidelity 32 kHz audio reconstruction at just 0.275 kbps and maintains strong intelligibility even at 0.175 kbps—achieving a compression ratio of approximately 3,000× and significantly outperforming state-of-the-art neural codecs.

Technology Category

Application Category

📝 Abstract
High-fidelity general audio compression at ultra-low bitrates is crucial for applications ranging from low-bandwidth communication to generative audio-language modeling. Traditional audio compression methods and contemporary neural codecs are fundamentally designed for waveform reconstruction. As a result, when operating at ultra-low bitrates, these methods degrade rapidly and often fail to preserve essential information, leading to severe acoustic artifacts and pronounced semantic distortion. To overcome these limitations, we introduce Generative Audio Compression (GAC), a novel paradigm shift from signal fidelity to task-oriented effectiveness. Implemented within the AI Flow framework, GAC is theoretically grounded in the Law of Information Capacity. These foundations posit that abundant computational power can be leveraged at the receiver to offset extreme communication bottlenecks--exemplifying the More Computation, Less Bandwidth philosophy. By integrating semantic understanding at the transmitter with scalable generative synthesis at the receiver, GAC offloads the information burden to powerful model priors. Our 1.8B-parameter model achieves high-fidelity reconstruction of 32kHz general audio at an unprecedented bitrate of 0.275kbps. Even at 0.175kbps, it still preserves a strong intelligible audio transmission capability, which represents an about 3000x compression ratio, significantly outperforming current state-of-the-art neural codecs in maintaining both perceptual quality and semantic consistency.
Problem

Research questions and friction points this paper is trying to address.

audio compression
ultra-low bitrate
semantic distortion
high-fidelity
generative audio
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Audio Compression
ultra-low bitrate
semantic-aware compression
AI Flow framework
information capacity
🔎 Similar Papers
No similar papers found.
Hao Ma
Hao Ma
Ph.D. Candidate at NWPU; Research Intern at TeleAI; M.S. (SDU)
Audio signal processing
R
Ruihao Jing
Institute of Artificial Intelligence (TeleAI), China Telecom
Shansong Liu
Shansong Liu
TeleAI
Music AITTSLLMMulti-modal LLMAudio codec
C
Cheng Gong
Institute of Artificial Intelligence (TeleAI), China Telecom
C
Chi Zhang
Institute of Artificial Intelligence (TeleAI), China Telecom
Xiao-Lei Zhang
Xiao-Lei Zhang
Professor, Northwestern Polytechnical University, China
Speech ProcessingMachine LearningSignal Processing
X
Xuelong Li
Institute of Artificial Intelligence (TeleAI), China Telecom