Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents

📅 2025-02-22

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

To address critical challenges—including low reliability, loose control, and poor interpretability—in LLM-driven scientific experiment automation, this paper proposes Curie, an AI agent framework. Methodologically, Curie introduces a novel collaborative architecture integrating *internal/external rigor modules* with an *experimental knowledge module*, ensuring end-to-end rigorous execution; constructs the first scientific experiment benchmark comprising 46 tasks derived from real publications and open-source projects; and synergistically combines multi-agent coordination, structured experimental planning, causal-reasoning guidance, knowledge-graph-enhanced retrieval, and LLM self-verification. Empirically, Curie achieves a 3.4× improvement in accuracy over the strongest baseline on experimental question answering. All code is publicly released.

Technology Category

Application Category

📝 Abstract

Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI agent framework designed to embed rigor into the experimentation process through three key components: an intra-agent rigor module to enhance reliability, an inter-agent rigor module to maintain methodical control, and an experiment knowledge module to enhance interpretability. To evaluate Curie, we design a novel experimental benchmark composed of 46 questions across four computer science domains, derived from influential research papers, and widely adopted open-source projects. Compared to the strongest baseline tested, we achieve a 3.4$ imes$ improvement in correctly answering experimental questions.Curie is open-sourced at https://github.com/Just-Curieous/Curie.

Problem

Research questions and friction points this paper is trying to address.

Automating rigorous scientific experimentation

Enhancing reliability and interpretability in experiments

Developing AI agents for methodical control

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI agent framework Curie

intra-agent rigor module

experiment knowledge module

🔎 Similar Papers

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery