WAKE: Watermarking Audio with Key Enrichment

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing audio watermarking methods suffer from three key limitations: lack of key control, conflicts during multi-round embedding, and insufficient support for variable-length watermarks. To address these, this paper proposes the first key-controllable end-to-end neural watermarking framework. Our method enforces key binding between embedding and decoding via key-conditioned modulation and robust frequency-domain feature modeling, ensuring that only authorized users possessing the secret key can decode the watermark. It guarantees lossless recovery of the original watermark across multiple embedding rounds and adaptively accommodates arbitrary-length watermarks. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods in perceptual fidelity (PESQ/STOI) and achieves over 99.2% watermark detection accuracy under strong distortions—including compression, resampling, and additive noise—thereby significantly enhancing security, robustness, and practical applicability.

Technology Category

Application Category

📝 Abstract
As deep learning advances in audio generation, challenges in audio security and copyright protection highlight the need for robust audio watermarking. Recent neural network-based methods have made progress but still face three main issues: preventing unauthorized access, decoding initial watermarks after multiple embeddings, and embedding varying lengths of watermarks. To address these issues, we propose WAKE, the first key-controllable audio watermark framework. WAKE embeds watermarks using specific keys and recovers them with corresponding keys, enhancing security by making incorrect key decoding impossible. It also resolves the overwriting issue by allowing watermark decoding after multiple embeddings and supports variable-length watermark insertion. WAKE outperforms existing models in both watermarked audio quality and watermark detection accuracy. Code, more results, and demo page: https://thuhcsi.github.io/WAKE.
Problem

Research questions and friction points this paper is trying to address.

Preventing unauthorized access to audio content
Decoding watermarks after multiple embeddings
Embedding variable-length watermarks securely
Innovation

Methods, ideas, or system contributions that make the work stand out.

Key-controllable audio watermark framework
Prevents unauthorized access with keys
Supports variable-length watermark insertion
🔎 Similar Papers
No similar papers found.
Yaoxun Xu
Yaoxun Xu
Tsinghua University
Jianwei Yu
Jianwei Yu
Tencent AI lab
ASR
Hangting Chen
Hangting Chen
Tencent Hunyuan
signal processingspeech separationDCASE
Z
Zhiyong Wu
Shenzhen International Graduate School, Tsinghua University, Shenzhen, China; The Chinese University of Hong Kong, Hong Kong SAR, China
Xixin Wu
Xixin Wu
The Chinese University of Hong Kong
D
Dong Yu
Tencent AI Lab, China
Rongzhi Gu
Rongzhi Gu
Tencent AI Lab
Speech separation
Y
Yi Luo
Tencent AI Lab, China