🤖 AI Summary
This work addresses the significant drop in robustness of existing text watermarking methods against cross-lingual back-translation attacks in low-resource languages such as Bengali. It presents the first systematic evaluation of mainstream watermarking algorithms—KGW, EXP, and Waterfall—on large language model-generated Bengali text. To enhance resilience, the authors propose a training-free hierarchical watermarking mechanism that integrates dual strategies applied during both the embedding and post-generation stages. Under this approach, detection accuracy after back-translation attacks improves from 9–13% to 40–50%, yielding a 3–4× relative gain, while incurring only controllable semantic quality degradation. This advancement substantially strengthens watermark robustness in low-resource linguistic settings.
📝 Abstract
As large language models (LLMs) are increasingly deployed for text generation, watermarking has become essential for authorship attribution, intellectual property protection, and misuse detection. While existing watermarking methods perform well in high-resource languages, their robustness in low-resource languages remains underexplored. This work presents the first systematic evaluation of state-of-the-art text watermarking methods: KGW, Exponential Sampling (EXP), and Waterfall, for Bangla LLM text generation under cross-lingual round-trip translation (RTT) attacks. Under benign conditions, KGW and EXP achieve high detection accuracy (>88%) with negligible perplexity and ROUGE degradation. However, RTT causes detection accuracy to collapse below RTT causes detection accuracy to collapse to 9-13%, indicating a fundamental failure of token-level watermarking. To address this, we propose a layered watermarking strategy that combines embedding-time and post-generation watermarks. Experimental results show that layered watermarking improves post-RTT detection accuracy by 25-35%, achieving 40-50% accuracy, representing a 3$\times$ to 4$\times$ relative improvement over single-layer methods, at the cost of controlled semantic degradation. Our findings quantify the robustness-quality trade-off in multilingual watermarking and establish layered watermarking as a practical, training-free solution for low-resource languages such as Bangla. Our code and data will be made public.