🤖 AI Summary
This work addresses a critical research gap: the vulnerability of reinforcement learning (RL)-based defenses for critical infrastructure to omniscient attackers exploiting learning dynamics in stochastic security games. We propose a linear influence network to model inter-node dependencies and, for the first time, extend the omniscient attacker model to stochastic game settings. To overcome state-space explosion and policy optimization challenges, we design a neural dynamic programming method that breaks conventional structural constraints. Integrating RL, stochastic game theory, linear influence networks, and neural dynamic programming, we establish a unified framework for computing optimal attacker strategies. Experiments demonstrate that omniscient attackers can significantly outperform naive RL defenders—validating learning dynamics as a source of systemic risk—and confirm our framework’s superior modeling capability and effectiveness in adversarial analysis.
📝 Abstract
The adoption of reinforcement learning for critical infrastructure defense introduces a vulnerability where sophisticated attackers can strategically exploit the defense algorithm's learning dynamics. While prior work addresses this vulnerability in the context of repeated normal-form games, its extension to the stochastic games remains an open research gap. We close this gap by examining stochastic security games between an RL defender and an omniscient attacker, utilizing a tractable linear influence network model. To overcome the structural limitations of prior methods, we propose and apply neuro-dynamic programming. Our experimental results demonstrate that the omniscient attacker can significantly outperform a naive defender, highlighting the critical vulnerability introduced by the learning dynamics and the effectiveness of the proposed strategy.