🤖 AI Summary
This study addresses the challenge of reliably distinguishing encrypted from compressed data in extremely short byte sequences (512–2048 bytes), where traditional byte-statistics-based or unimodal approaches often fail. To overcome this limitation, the authors propose Triumvir, a novel architecture that breaks the unimodal assumption by integrating three complementary raw-byte representations—statistical, sequential, and spatial—and incorporates an uncertainty-aware multimodal ensemble mechanism. Evaluated on binary and multiclass classification tasks, Triumvir achieves accuracy gains of 4.5 and 6.4 percentage points, respectively. Ablation studies further demonstrate that the multimodal fusion alone contributes up to a 5-percentage-point improvement, underscoring the critical role of multi-perspective representations in identifying data types under low-information conditions.
📝 Abstract
Reliable identification of encrypted data fragments is essential in cybersecurity, with applications to ransomware detection, digital forensics, and large-scale data analysis. Distinguishing encrypted from compressed fragments is particularly challenging, as short fragments lack structural data and exhibit low statistical redundancy. Traditional statistical methods based on byte-level distributions show limited effectiveness on this task. Recent machine learning approaches improve performance by learning subtle patterns from raw bytes, but predominantly rely on single-modal representations, implicitly assuming that a single view of the data is sufficient for accurate classification. This paper shows that this assumption becomes a fundamental limitation in low-information settings, when only small fragments of data are available (512--2048 Bytes). We propose Triumvir, a multi-modal, uncertainty-aware ensemble architecture that integrates statistical, sequential, and spatial representations of raw byte fragments. Extensive experimental analysis demonstrates that Triumvir consistently outperforms state-of-the-art methods with gains of up to +4.5pp in binary and +6.4pp in multiclass classification. Ablation studies confirm that combining modalities is critical, yielding improvements of up to +5pp over partial configurations.