🤖 AI Summary
The lack of standardized, reproducible, and open benchmarks hinders systematic evaluation and advancement of AI agents for network troubleshooting.
Method: This paper introduces NetBench—the first open-source experimental platform framework tailored to this domain—integrating large language models (LLMs) with AI agent architectures to enable automated diagnostic reasoning, network state comprehension, and interactive fault isolation, thereby supporting low-barrier, reproducible, and comparable model evaluation.
Contribution/Results: (1) A modular, extensible benchmarking platform prototype; (2) A unified evaluation protocol and a curated set of representative network failure scenarios; (3) Substantial reduction in operational overhead for developing and validating AI agents. Experimental results demonstrate NetBench’s feasibility and generalizability across diverse network issues, establishing foundational infrastructure for systematic research and practical deployment of generative AI in network operations.
📝 Abstract
Recent research has demonstrated the effectiveness of Artificial Intelligence (AI), and more specifically, Large Language Models (LLMs), in supporting network configuration synthesis and automating network diagnosis tasks, among others. In this preliminary work, we restrict our focus to the application of AI agents to network troubleshooting and elaborate on the need for a standardized, reproducible, and open benchmarking platform, where to build and evaluate AI agents with low operational effort.