Towards a Playground to Democratize Experimentation and Benchmarking of AI Agents for Network Troubleshooting

📅 2025-07-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The lack of standardized, reproducible, and open benchmarks hinders systematic evaluation and advancement of AI agents for network troubleshooting. Method: This paper introduces NetBench—the first open-source experimental platform framework tailored to this domain—integrating large language models (LLMs) with AI agent architectures to enable automated diagnostic reasoning, network state comprehension, and interactive fault isolation, thereby supporting low-barrier, reproducible, and comparable model evaluation. Contribution/Results: (1) A modular, extensible benchmarking platform prototype; (2) A unified evaluation protocol and a curated set of representative network failure scenarios; (3) Substantial reduction in operational overhead for developing and validating AI agents. Experimental results demonstrate NetBench’s feasibility and generalizability across diverse network issues, establishing foundational infrastructure for systematic research and practical deployment of generative AI in network operations.

Technology Category

Application Category

📝 Abstract
Recent research has demonstrated the effectiveness of Artificial Intelligence (AI), and more specifically, Large Language Models (LLMs), in supporting network configuration synthesis and automating network diagnosis tasks, among others. In this preliminary work, we restrict our focus to the application of AI agents to network troubleshooting and elaborate on the need for a standardized, reproducible, and open benchmarking platform, where to build and evaluate AI agents with low operational effort.
Problem

Research questions and friction points this paper is trying to address.

Developing a playground for AI agent benchmarking
Standardizing network troubleshooting evaluation methods
Enabling low-effort AI agent experimentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI agents for network troubleshooting
Standardized benchmarking platform
Low operational effort evaluation
🔎 Similar Papers
No similar papers found.
Zhihao Wang
Zhihao Wang
Peking University
RoboticsReinforcement Learning
A
Alessandro Cornacchia
KAUST
F
Franco Galante
Politecnico di Torino
Carlo Centofanti
Carlo Centofanti
Assistant Professor @DISIM
SDNNetwork VirtualizationMulti-access Edge ComputingPower ConsumptionCloud Computing
A
Alessio Sacco
Politecnico di Torino
D
Dingde Jiang
UESTC