Large Language Models are Autonomous Cyber Defenders

📅 2025-05-07

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the evaluation and coordination challenges of large language models (LLMs) in multi-agent autonomous cyber defense (ACD). We first integrate LLM-based agents into the CybORG CAGE 4 platform, constructing a hybrid ACD system that synergizes LLMs with reinforcement learning (RL) agents. Our method introduces the first LLM-RL hybrid architecture tailored for multi-agent ACD, along with a lightweight semantic communication protocol—overcoming limitations of single-agent paradigms. Experimental results demonstrate that LLMs excel in zero-shot response generation, action interpretability, and cross-scenario generalization, whereas RL agents exhibit superior long-term strategic stability. Crucially, their collaboration significantly enhances overall defense efficiency and robustness. The findings reveal systematic complementarity between LLMs and RL agents along three dimensions: interpretability, generalization capability, and collaborative adaptability—establishing foundational insights for hybrid agent design in autonomous cyber defense.

Technology Category

Application Category

📝 Abstract

Fast and effective incident response is essential to prevent adversarial cyberattacks. Autonomous Cyber Defense (ACD) aims to automate incident response through Artificial Intelligence (AI) agents that plan and execute actions. Most ACD approaches focus on single-agent scenarios and leverage Reinforcement Learning (RL). However, ACD RL-trained agents depend on costly training, and their reasoning is not always explainable or transferable. Large Language Models (LLMs) can address these concerns by providing explainable actions in general security contexts. Researchers have explored LLM agents for ACD but have not evaluated them on multi-agent scenarios or interacting with other ACD agents. In this paper, we show the first study on how LLMs perform in multi-agent ACD environments by proposing a new integration to the CybORG CAGE 4 environment. We examine how ACD teams of LLM and RL agents can interact by proposing a novel communication protocol. Our results highlight the strengths and weaknesses of LLMs and RL and help us identify promising research directions to create, train, and deploy future teams of ACD agents.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs in multi-agent autonomous cyber defense scenarios

Comparing LLM and RL agents' performance in cyber defense teams

Developing communication protocols for mixed LLM-RL defense teams

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs provide explainable actions in cyber defense

Integration of LLMs into multi-agent CybORG CAGE 4

Novel communication protocol for LLM and RL teams

🔎 Similar Papers

Large Language Models for Cyber Security: A Systematic Literature Review