MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback

๐Ÿ“… 2025-05-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the limited hypothesis validation efficiency caused by high cost and low throughput of wet-lab experiments, this paper proposes an experiment-guided hypothesis ranking paradigm. We formally define this task for the first time; develop an interpretable *in silico* hypothesis simulator that integrates domain knowledge with noise-aware modeling; and design a dynamic ranking method leveraging functional clustering and simulation-based feedback. Evaluated on a real-world chemical hypothesis dataset comprising 124 hypotheses, our approach significantly outperforms baselines relying solely on internal reasoning of large language models. Ablation studies confirm the individual contributions of each component. Results demonstrate that the simulation feedback mechanism effectively bridges the gap between theoretical inference and empirical constraints, achieving both strong generalizability and interpretability. This work establishes a novel pathway for dataโ€“experiment co-driven scientific discovery.

Technology Category

Application Category

๐Ÿ“ Abstract
Hypothesis ranking is a crucial component of automated scientific discovery, particularly in natural sciences where wet-lab experiments are costly and throughput-limited. Existing approaches focus on pre-experiment ranking, relying solely on large language model's internal reasoning without incorporating empirical outcomes from experiments. We introduce the task of experiment-guided ranking, which aims to prioritize candidate hypotheses based on the results of previously tested ones. However, developing such strategies is challenging due to the impracticality of repeatedly conducting real experiments in natural science domains. To address this, we propose a simulator grounded in three domain-informed assumptions, modeling hypothesis performance as a function of similarity to a known ground truth hypothesis, perturbed by noise. We curate a dataset of 124 chemistry hypotheses with experimentally reported outcomes to validate the simulator. Building on this simulator, we develop a pseudo experiment-guided ranking method that clusters hypotheses by shared functional characteristics and prioritizes candidates based on insights derived from simulated experimental feedback. Experiments show that our method outperforms pre-experiment baselines and strong ablations.
Problem

Research questions and friction points this paper is trying to address.

Automating hypothesis ranking for costly wet-lab experiments
Incorporating experimental feedback to improve hypothesis prioritization
Simulating experiments to validate ranking strategies without real tests
Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulator models hypothesis performance with noise
Clusters hypotheses by shared functional characteristics
Prioritizes candidates using simulated experimental feedback