FreStega: A Plug-and-Play Method for Boosting Imperceptibility and Capacity in Generative Linguistic Steganography for Real-World Scenarios

📅 2024-12-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language model (LLM)-based steganography methods suffer from poor imperceptibility, low embedding capacity, vulnerability to detection, and deviation from natural text distributions. To address these limitations, we propose a plug-and-play dual-dimensional language model distribution reconstruction framework: it introduces instantaneous entropy–adaptive temperature scaling along the temporal dimension and designs a domain-aware probability distribution alignment mechanism along the spatial dimension, enabling dynamic recalibration of token probabilities during generation. Our approach integrates probability reweighting with distribution-preserving steganographic design, maintaining textual quality while significantly enhancing robustness against mainstream steganalyzers. Experimental results show a 15.41% increase in embedding capacity and improved resistance to statistical and deep-learning-based detection. This work establishes a new paradigm for generative language steganography that jointly optimizes security and practicality.

Technology Category

Application Category

📝 Abstract
Linguistic steganography embeds secret information in seemingly innocent texts, safeguarding privacy in surveillance environments. Generative linguistic steganography leverages the probability distribution of language models (LMs) and applies steganographic algorithms to generate stego tokens, gaining attention with recent Large Language Model (LLM) advancements. To enhance security, researchers develop distribution-preserving stego algorithms to minimize the gap between stego sampling and LM sampling. However, the reliance on language model distributions, coupled with deviations from real-world cover texts, results in insufficient imperceptibility when facing steganalysis detectors in real-world scenarios. Moreover, LLM distributions tend to be more deterministic, resulting in reduced entropy and, consequently, lower embedding capacity. In this paper, we propose FreStega, a plug-and-play method to reconstruct the distribution of language models used for generative linguistic steganography. FreStega dynamically adjusts token probabilities from the language model at each step of stegotext auto-regressive generation, leveraging both sequential and spatial dimensions. In sequential adjustment, the temperature is dynamically adjusted based on instantaneous entropy, enhancing the diversity of stego texts and boosting embedding capacity. In the spatial dimension, the distribution is aligned with guidance from the target domain corpus, closely mimicking real cover text in the target domain. By reforming the distribution, FreStega enhances the imperceptibility of stego text in practical scenarios and improves steganographic capacity by 15.41%, all without compromising the quality of the generated text. FreStega serves as a plug-and-play remedy to enhance the imperceptibility and embedding capacity of existing distribution-preserving steganography methods in real-world scenarios.
Problem

Research questions and friction points this paper is trying to address.

Steganography
Large Language Models
Information Hiding
Innovation

Methods, ideas, or system contributions that make the work stand out.

FreStega
Steganography
Text Probability Adjustment
🔎 Similar Papers
No similar papers found.