🤖 AI Summary
To address low detection accuracy, poor generalization, and high computational complexity in steganalysis of compressed speech, this paper introduces GraphSAGE—a graph neural network—into this domain for the first time, proposing a hierarchical graph learning framework. Specifically, VoIP speech frames serve as graph nodes, and inter-frame relationships are modeled to construct a relational graph; neighbor aggregation enables joint modeling of local fine-grained features and global structural patterns, effectively capturing the statistical regularities induced by quantization-index-modulation (QIM) steganography. Evaluated on 0.5-second short speech segments, the method achieves 98.0% detection accuracy (95.17% under low embedding rates), outperforming the state-of-the-art by 2.8 percentage points. Moreover, inference latency is merely 0.016 seconds per sample, striking an exceptional balance between accuracy and efficiency—making it suitable for real-time, online speech steganalysis.
📝 Abstract
Steganalysis methods based on deep learning (DL) often struggle with computational complexity and challenges in generalizing across different datasets. Incorporating a graph neural network (GNN) into steganalysis schemes enables the leveraging of relational data for improved detection accuracy and adaptability. This paper presents the first application of a Graph Neural Network (GNN), specifically the GraphSAGE architecture, for steganalysis of compressed voice over IP (VoIP) speech streams. The method involves straightforward graph construction from VoIP streams and employs GraphSAGE to capture hierarchical steganalysis information, including both fine grained details and high level patterns, thereby achieving high detection accuracy. Experimental results demonstrate that the developed approach performs well in uncovering quantization index modulation (QIM)-based steganographic patterns in VoIP signals. It achieves detection accuracy exceeding 98 percent even for short 0.5 second samples, and 95.17 percent accuracy under challenging conditions with low embedding rates, representing an improvement of 2.8 percent over the best performing state of the art methods. Furthermore, the model exhibits superior efficiency, with an average detection time as low as 0.016 seconds for 0.5-second samples an improvement of 0.003 seconds. This makes it efficient for online steganalysis tasks, providing a superior balance between detection accuracy and efficiency under the constraint of short samples with low embedding rates.