Automated Cyber Defense with Generalizable Graph-based Reinforcement Learning Agents

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Conventional reinforcement learning (RL) for network defense typically models networks as static lists of host states, leading to overfitting on specific topologies and poor generalization across unseen network structures. Method: We propose a two-player partially observable Markov decision process (POMDP) framework grounded in attributed heterogeneous graphs. Networks are represented as attribute-rich, relational graphs encoding dynamic interactions among hosts and system entities; relational inductive biases are explicitly incorporated to capture structural dependencies. A graph-editing action space enables fine-grained, topology-aware defensive operations. Contribution/Results: The framework endows agents with zero-shot transfer capability—adapting to previously unseen network topologies without retraining. Evaluated in multi-agent adversarial settings, it significantly outperforms existing state-of-the-art methods, demonstrating superior generalization, robustness, and adaptability to structural variations.

Technology Category

Application Category

📝 Abstract

Deep reinforcement learning (RL) is emerging as a viable strategy for automated cyber defense (ACD). The traditional RL approach represents networks as a list of computers in various states of safety or threat. Unfortunately, these models are forced to overfit to specific network topologies, rendering them ineffective when faced with even small environmental perturbations. In this work, we frame ACD as a two-player context-based partially observable Markov decision problem with observations represented as attributed graphs. This approach allows our agents to reason through the lens of relational inductive bias. Agents learn how to reason about hosts interacting with other system entities in a more general manner, and their actions are understood as edits to the graph representing the environment. By introducing this bias, we will show that our agents can better reason about the states of networks and zero-shot adapt to new ones. We show that this approach outperforms the state-of-the-art by a wide margin, and makes our agents capable of defending never-before-seen networks against a wide range of adversaries in a variety of complex, and multi-agent environments.

Problem

Research questions and friction points this paper is trying to address.

Overcoming overfitting to specific network topologies in cyber defense

Enabling zero-shot adaptation to new network environments

Improving defense against diverse adversaries in complex multi-agent settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based reinforcement learning for cyber defense

Relational inductive bias for generalization

Zero-shot adaptation to new networks

🔎 Similar Papers

No similar papers found.

Authors to Follow