🤖 AI Summary
This paper formalizes the “Ordered Pattern Matching with Color and Critical-Instruction Constraints” problem (OMDCI/OMDCI+) for low-level malware detection via feature matching—i.e., identifying subsequences in instruction sequences that satisfy both color consistency and relative ordering constraints among critical instructions. We prove that OMDCI and OMDCI+ are NP-complete, and that empty-solution decision (i.e., pattern non-occurrence) is co-NP-hard; thus, no fixed-parameter tractable (FPT) algorithm exists unless P = co-NP. This constitutes the first rigorous computational complexity-theoretic characterization of the fundamental limits of feature-matching–based detection. Our analysis integrates subsequence modeling, explicit polynomial-time reductions, and parameterized complexity theory, revealing inherent decidability limitations when detecting order-sensitive malicious behaviors. The results provide a critical theoretical foundation for guiding the design and evaluation of malware detection models.
📝 Abstract
We formulate low-level malware detection using algorithms based on feature matching as Order-based Malware Detection with Critical Instructions (General-OMDCI): given a pattern in the form of a sequence (M) of colored blocks, where each block contains a critical character (representing a unique sequence of critical instructions potentially associated with malware but without certainty), and a program (A), represented as a sequence of (n) colored blocks with critical characters, the goal is to find two subsequences, (M') of (M) and (A') of (A), with blocks matching in color and whose critical characters form a permutation of each other. When $M$ is a permutation in both colors and critical characters the problem is called OMDCI. If we additionally require $M'=M$, then the problem is called OMDCI+; if in this case $d=|M|$ is used as a parameter, then the OMDCI+ problem is easily shown to be FPT. Our main (negative) results are on the cases when $|M|$ is arbitrary and are summarized as follows: OMDCI+ is NP-complete, which implies OMDCI is also NP-complete. For the special case of OMDCI, deciding if the optimal solution has length $0$ (i.e., deciding if no part of (M) appears in (A)) is co-NP-hard. As a result, the OMDCI problem does not admit an FPT algorithm unless P=co-NP. In summary, our results imply that using algorithms based on feature matching to identify malware or determine the absence of malware in a given low-level program are both hard.