🤖 AI Summary
Existing audio watermarking methods suffer from three key limitations: lack of key control, conflicts during multi-round embedding, and insufficient support for variable-length watermarks. To address these, this paper proposes the first key-controllable end-to-end neural watermarking framework. Our method enforces key binding between embedding and decoding via key-conditioned modulation and robust frequency-domain feature modeling, ensuring that only authorized users possessing the secret key can decode the watermark. It guarantees lossless recovery of the original watermark across multiple embedding rounds and adaptively accommodates arbitrary-length watermarks. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods in perceptual fidelity (PESQ/STOI) and achieves over 99.2% watermark detection accuracy under strong distortions—including compression, resampling, and additive noise—thereby significantly enhancing security, robustness, and practical applicability.
📝 Abstract
As deep learning advances in audio generation, challenges in audio security and copyright protection highlight the need for robust audio watermarking. Recent neural network-based methods have made progress but still face three main issues: preventing unauthorized access, decoding initial watermarks after multiple embeddings, and embedding varying lengths of watermarks. To address these issues, we propose WAKE, the first key-controllable audio watermark framework. WAKE embeds watermarks using specific keys and recovers them with corresponding keys, enhancing security by making incorrect key decoding impossible. It also resolves the overwriting issue by allowing watermark decoding after multiple embeddings and supports variable-length watermark insertion. WAKE outperforms existing models in both watermarked audio quality and watermark detection accuracy. Code, more results, and demo page: https://thuhcsi.github.io/WAKE.