🤖 AI Summary
This work addresses the challenge of accurate and real-time anomaly detection in industrial control systems (ICS), where complex dependencies among sensors and actuators hinder existing approaches. To this end, it pioneers the integration of large language models (LLMs) into ICS anomaly detection by proposing a multi-stage prompt engineering framework to construct an initial dependency graph, followed by an LLM-Optimization mechanism that iteratively refines the graph structure—achieving notable improvements in node accuracy, edge consistency, and logical coherence. Building upon this enhanced graph, the method employs a refined graph neural network encoder-decoder architecture to detect anomalies in industrial spatiotemporal graphs via reconstruction error. Evaluated on nine datasets—including two public, six simulated, and one real-world robotic arm dataset—the approach significantly outperforms twelve state-of-the-art baselines in both F1-score and the time-aware metric eTaF1.
📝 Abstract
Industrial Internet systems face increasing threats from sophisticated industrial control system (ICS) attacks, resulting in critical safety incidents. However, existing tools exhibit limited effectiveness in real-time anomaly detection due to the complex dependencies among sensors and actuators. To tackle this, we present IstGPT, the first industrial anomaly detection tool based on LLMs and graph learning to provide real-time protection against a wide range of ICS attacks. IstGPT achieves fine-grained and precise modeling on spatial-temporal dependencies in industrial cyber-physical systems. It first leverages industrial multi-modal knowledge, including operational data, technical documents, and system diagrams, to extract sensor-actuator dependency graphs via multi-stage prompt engineering. Then, LLM-Optimation iteratively refines the graph based on node accuracy, edge consistency, and logical coherence. Finally, IstGPT integrated improved graph neural networks with an encoder-decoder architecture to detect anomalies via reconstruction errors. We evaluate IstGPT against 12 state-of-the-art baselines on 9 datasets, including 2 public, 6 simulated, and a real-world robotic arm dataset. IstGPT achieves the best F1-scores and eTaF1 (a newer time-aware metric) across nine datasets. We further discuss the feasibility of deploying IstGPT in real-world industrial scenarios.