🤖 AI Summary
Existing malware image representation methods inadequately balance executable file structural characteristics with compatibility for image-based deep learning analysis.
Method: This paper systematically evaluates eight mainstream byte-to-image conversion strategies—including grayscale mapping, entropy visualization, and RGB encoding—across six deep learning architectures (CNN, ResNet, ViT, etc.), conducting controlled cross-method classification experiments on large-scale malware datasets.
Contribution/Results: Empirical results show minimal performance variation (<1.2% in accuracy and F1-score) across conversion methods, while model architecture choice exerts significantly greater influence on classification performance. The study validates the inherent efficacy and robustness of the image-based analysis paradigm itself—not any specific visualization scheme. Crucially, this work provides the first large-scale controlled evidence that, for malware image classification, *how to see* (i.e., model selection) is fundamentally more decisive than *what to see* (i.e., byte-to-image mapping). These findings establish an empirical benchmark and theoretical foundation for methodological design in malware image analysis.
📝 Abstract
Recently, a considerable amount of malware research has focused on the use of powerful image-based machine learning techniques, which generally yield impressive results. However, before image-based techniques can be applied to malware, the samples must be converted to images, and there is no generally-accepted approach for doing so. The malware-to-image conversion strategies found in the literature often appear to be ad hoc, with little or no effort made to take into account properties of executable files. In this paper, we experiment with eight distinct malware-to-image conversion techniques, and for each, we test a variety of learning models. We find that several of these image conversion techniques perform similarly across a range of learning models, in spite of the image conversion processes being quite different. These results suggest that the effectiveness of image-based malware classification techniques may depend more on the inherent strengths of image analysis techniques, as opposed to the precise details of the image conversion strategy.