Hierarchical Graph Neural Network for Compressed Speech Steganalysis

📅 2025-07-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low detection accuracy, poor generalization, and high computational complexity in steganalysis of compressed speech, this paper introduces GraphSAGE—a graph neural network—into this domain for the first time, proposing a hierarchical graph learning framework. Specifically, VoIP speech frames serve as graph nodes, and inter-frame relationships are modeled to construct a relational graph; neighbor aggregation enables joint modeling of local fine-grained features and global structural patterns, effectively capturing the statistical regularities induced by quantization-index-modulation (QIM) steganography. Evaluated on 0.5-second short speech segments, the method achieves 98.0% detection accuracy (95.17% under low embedding rates), outperforming the state-of-the-art by 2.8 percentage points. Moreover, inference latency is merely 0.016 seconds per sample, striking an exceptional balance between accuracy and efficiency—making it suitable for real-time, online speech steganalysis.

Technology Category

Application Category

📝 Abstract
Steganalysis methods based on deep learning (DL) often struggle with computational complexity and challenges in generalizing across different datasets. Incorporating a graph neural network (GNN) into steganalysis schemes enables the leveraging of relational data for improved detection accuracy and adaptability. This paper presents the first application of a Graph Neural Network (GNN), specifically the GraphSAGE architecture, for steganalysis of compressed voice over IP (VoIP) speech streams. The method involves straightforward graph construction from VoIP streams and employs GraphSAGE to capture hierarchical steganalysis information, including both fine grained details and high level patterns, thereby achieving high detection accuracy. Experimental results demonstrate that the developed approach performs well in uncovering quantization index modulation (QIM)-based steganographic patterns in VoIP signals. It achieves detection accuracy exceeding 98 percent even for short 0.5 second samples, and 95.17 percent accuracy under challenging conditions with low embedding rates, representing an improvement of 2.8 percent over the best performing state of the art methods. Furthermore, the model exhibits superior efficiency, with an average detection time as low as 0.016 seconds for 0.5-second samples an improvement of 0.003 seconds. This makes it efficient for online steganalysis tasks, providing a superior balance between detection accuracy and efficiency under the constraint of short samples with low embedding rates.
Problem

Research questions and friction points this paper is trying to address.

Detects steganography in compressed VoIP using GraphSAGE
Improves accuracy for short samples with low embedding rates
Reduces computational complexity in deep learning steganalysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

GraphSAGE GNN for VoIP steganalysis
Hierarchical pattern capture in speech streams
High accuracy with low computational cost
🔎 Similar Papers
No similar papers found.
M
Mustapha Hemis
LCPTS Laboratory, University of Sciences and Technology Houari Boumediene (USTHB), P.O. Box 32, El-Alia, Bab-Ezzouar, 16111, Algiers, Algeria
H
Hamza Kheddar
LSEA Laboratory, Department of Electrical Engineering, University of Medea, 26000, Algeria
Mohamed Chahine Ghanem
Mohamed Chahine Ghanem
Associate Professor - London Metropolitan University | University of Liverpool
Cyber SecurityApplied AIIoTComputer VisionDigital Investigations
Bachir Boudraa
Bachir Boudraa
Full Professor in electronics, University of Science and Technology Houari Boumediene
speech processingacousticsSignal processingnon-destructive controlFPGA