🤖 AI Summary
Current LLM-based scientific agent systems lack interactive reasoning and multi-perspective evaluation mechanisms for autonomous scientific discovery. To address this, we propose IDVSCI—a multi-agent research team framework powered by large language models. Our method introduces (1) a dynamic knowledge exchange mechanism enabling iterative feedback and collaborative reasoning among agents, and (2) a dual-diversity peer review mechanism that emulates interdisciplinary, expertise-heterogeneous expert evaluation to enhance argument rigor and creative hypothesis generation. IDVSCI integrates internal deliberation, voting-based decision-making, and cross-domain knowledge fusion. Evaluated on established computer science benchmarks and a newly curated health science dataset, IDVSCI significantly outperforms baseline systems—including AI Scientist and VIRSCI—demonstrating superior performance in scientific discovery depth, creativity, and cross-domain generalizability.
📝 Abstract
Scientific progress increasingly relies on effective collaboration among researchers, a dynamic that large language models (LLMs) have only begun to emulate. While recent LLM-based scientist agents show promise in autonomous scientific discovery, they often lack the interactive reasoning and evaluation mechanisms essential to real-world research. We propose IDVSCI (Internal Discussion and Vote SCIentists), a multi-agent framework built on LLMs that incorporates two key innovations: a Dynamic Knowledge Exchange mechanism enabling iterative feedback among agents, and a Dual-Diversity Review paradigm that simulates heterogeneous expert evaluation. These components jointly promote deeper reasoning and the generation of more creative and impactful scientific ideas. To evaluate the effectiveness and generalizability of our approach, we conduct experiments on two datasets: a widely used benchmark in computer science and a new dataset we introduce in the health sciences domain. Results show that IDVSCI consistently achieves the best performance across both datasets, outperforming existing systems such as AI Scientist and VIRSCI. These findings highlight the value of modeling interaction and peer review dynamics in LLM-based autonomous research.