🤖 AI Summary
This work addresses the challenges of integrating heterogeneous, multi-source data and enabling efficient query-based analysis in the computing operations of the CMS experiment. We propose and implement an end-to-end open-source framework that deploys, for the first time in high-energy physics, a private, scalable agent system. The framework unifies documentation, historical logs, and real-time monitoring data, leveraging locally executed open-source large language models within a configurable agent architecture to support fully on-premises processing of sensitive data and efficient retrieval-augmented inference. Since its deployment in February 2026, the system has operated stably in a production environment, effectively responding to operational queries and receiving positive feedback from operators. A hybrid evaluation methodology further demonstrates the competitiveness and practical utility of open-source models in domain-specific tasks.
📝 Abstract
We present Archi, an open-source, end-to-end framework for scientific collaborations that combines the systematic ingestion and organization of heterogeneous data sources with the deployment of configurable, private, and extensible agents that retrieve and reason over them. An instance of Archi has been deployed for the Computing Operations team of the CMS experiment at CERN's LHC since February 2026 as a support agent for technical operators, offering retrieval and analysis capabilities by combining documentation, historical data, and live monitoring systems. We evaluate the system on operator feedback and a question set collected from production usage, graded by human and automated panels. The system proves effective at operational tasks, resolving real-world queries posed by CMS operators. We also observe that locally-hosted, open-weight models perform competitively, enabling fully private management of sensitive data.