🤖 AI Summary
This study addresses the challenges of information retrieval and verification in muon collider research, where the rapidly growing and heterogeneous literature complicates knowledge synthesis. To tackle this, the authors propose an intelligent hybrid retrieval-augmented generation (RAG) framework that introduces the first scientific question-answering benchmark for this domain. The framework integrates a hybrid retriever combining sparse lexical and dense semantic retrieval with a reasoning agent capable of query decomposition and evidence expansion. Experimental results demonstrate that this approach significantly outperforms existing RAG baselines in retrieval coverage, answer quality, and factual consistency, offering high-energy physics researchers a reliable and interpretable paradigm for intelligent scientific analysis.
📝 Abstract
Muon collider research spans accelerator physics, detector instrumentation, and high-energy phenomenology, with relevant evidence scattered across a rapidly expanding and heterogeneous body of scientific literature. As high-energy physics (HEP) increasingly explores agent-assisted analysis workflows, efficiently locating, integrating, and verifying scientific evidence becomes an essential capability. While retrieval-augmented generation (RAG) offers a promising framework for scientific question answering, integrating agentic reasoning without compromising retrieval precision remains a key challenge. In this work, we present agentic hybrid RAG, an evidence-grounded RAG framework for muon collider research. The framework combines a hybrid retriever, integrating sparse lexical and dense semantic retrieval, with an agentic reasoning module for query decomposition, evidence expansion, and grounded answer generation. To enable systematic evaluation, we construct the first benchmark for retrieval-augmented scientific question answering in the muon collider domain, comprising a curated literature corpus together with dedicated retrieval and answer-generation benchmarks covering major detector and physics research topics. Extensive evaluation shows that hybrid retrieval provides the strongest retrieval backbone, while agentic reasoning is most effective for controlled evidence expansion and answer synthesis. Built on this principle, agentic hybrid RAG consistently outperforms representative retrieval and RAG baselines in retrieval effectiveness, answer quality, evidence coverage, and factual grounding. Together, the benchmark and framework provide a foundation for evidence-grounded scientific question answering and future HEP analysis agents operating over large-scale scientific literature.