๐ค AI Summary
Defending against 1-day/n-day exploit attacks on network devices remains challenging due to poor cross-device scalability, low legacy-system compatibility, and error-prone manual deployment of existing host patching and network filtering solutions.
Method: This paper proposes REFN, a novel network-driven reinforcement learning framework that leverages real-time traffic as an online reward signal to guide a large language model (LLM) in autonomously generating precise network filtering rules. REFN integrates proxy-based RAG knowledge distillation to enhance vulnerability semantic understanding, employs a VNF pipeline for end-to-end โlanguage-to-network-policyโ translation, and incorporates an online proxy verification mechanism to mitigate LLM hallucination.
Contribution/Results: Evaluated across 22 real-world attack scenarios, REFN achieves a 21.1% accuracy improvement over state-of-the-art methods, reduces mean remediation time to 3.65 hours, and scales seamlessly to tens of thousands of heterogeneous network devices.
๐ Abstract
The exploitation of 1 day or n day vulnerabilities poses severe threats to networked devices due to massive deployment scales and delayed patching (average Mean Time To Patch exceeds 60 days). Existing defenses, including host based patching and network based filtering, are inadequate due to limited scalability across diverse devices, compatibility issues especially with embedded or legacy systems, and error prone deployment process (manual patch validation). To address these issues, we introduce REFN (Reinforcement Learning From Network), a novel framework that trains Large Language Models (LLMs) to autonomously generate network filters to prevent 1 day or n day exploitations. REFN ensures scalability by uniquely employs Reinforcement Learning (RL) driven by online network rewards instead of traditional Human Feedback (RLHF). REFN guarantees compatibility via unified deployment on edge security gateways (Amazon Eero). REFN provides robustness via online validation using real network traffic. Crucially, REFN addresses three core challenges in training LLMs for exploit prevention: 1) expanding current LLMs limited vulnerability fixing expertise via Agentic RAG based Knowledge Distillation, 2) bridging current LLMs language to network gaps through an RL From VNF Pipeline that translates language context (vulnerability description) into network enforcement, 3) addressing the LLM hallucination and non determinism via the Online Agentic Validation that penalizes erroneous outputs. Evaluated across 22 families of 1 day or n day exploits, REFN demonstrates effectiveness (21.1 percent higher accuracy than alternatives), efficiency (Mean Time To Patch of 3.65 hours) and scalability (easily scale to 10K devices). REFN serves as an initial step toward training LLMs to rapidly prevent massive scale 1 day or n day exploitations.