🤖 AI Summary
Generative linguistic steganography faces a fundamental challenge: low-quality stegotext due to limited generative capacity of existing models and conventional embedding algorithms that treat sensitive information (e.g., semantics, randomness) as noise—forcing selection of low-probability tokens and degrading semantic coherence and fluency. To address this, we propose the Character-level Diffusion Embedding Algorithm (CDEA), the first method to transform character-level statistical properties of sensitive information into constructive signals. CDEA integrates power-law-based token grouping with diffusion-inspired frequency modulation over candidate words, significantly increasing high-probability token selection. It synergistically combines character-level modeling with XLNet’s long-context understanding, preserving high extraction accuracy while substantially improving perceptual imperceptibility. Experiments demonstrate that CDEA consistently outperforms state-of-the-art methods across BLEU, perplexity, and human evaluation metrics, achieving unprecedented stegotext quality.
📝 Abstract
Generating high-quality steganographic text is a fundamental challenge in the field of generative linguistic steganography. This challenge arises primarily from two aspects: firstly, the capabilities of existing models in text generation are limited; secondly, embedding algorithms fail to effectively mitigate the negative impacts of sensitive information's properties, such as semantic content or randomness. Specifically, to ensure that the recipient can accurately extract hidden information, embedding algorithms often have to consider selecting candidate words with relatively low probabilities. This phenomenon leads to a decrease in the number of high-probability candidate words and an increase in low-probability candidate words, thereby compromising the semantic coherence and logical fluency of the steganographic text and diminishing the overall quality of the generated steganographic material. To address this issue, this paper proposes a novel embedding algorithm, character-based diffusion embedding algorithm (CDEA). Unlike existing embedding algorithms that strive to eliminate the impact of sensitive information's properties on the generation process, CDEA leverages sensitive information's properties. It enhances the selection frequency of high-probability candidate words in the candidate pool based on general statistical properties at the character level and grouping methods based on power-law distributions, while reducing the selection frequency of low-probability candidate words in the candidate pool. Furthermore, to ensure the effective transformation of sensitive information in long sequences, we also introduce the XLNet model. Experimental results demonstrate that the combination of CDEA and XLNet significantly improves the quality of generated steganographic text, particularly in terms of perceptual-imperceptibility.