Towards Diverse Scientific Hypothesis Search with Large Language Models

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limited diversity and robustness of existing scientific hypothesis generation methods, which often suffer from over-optimization and struggle with noisy, costly downstream validation. The authors formulate hypothesis generation as a constrained sampling problem and propose a multi-temperature evolutionary framework inspired by parallel tempering. By enabling information exchange across temperatures, the approach enhances exploration and effectively mitigates diversity collapse. Integrating large language models with multi-temperature evolutionary search, the method substantially improves both the quality and diversity of generated hypotheses in tasks spanning molecular design, equation discovery, and algorithm synthesis. Notably, it demonstrates superior robustness and practical utility under constrained validation budgets.

📝 Abstract

Large language models (LLMs) are on the rise for accelerating scientific discovery, most recently in advanced tasks such as generating valid scientific hypotheses. Yet in many discovery settings, the goal is not to identify a single best hypothesis since validation can be noisy and expensive, and scientists benefit from a set of high-quality alternative hypotheses that hedge against downstream uncertainty for the best solutions. Nevertheless, commonly used evolutionary search recipes tend to prioritize optimization over exploration in hypothesis generation, and the resulting selection pressure during the search process leads to diversity collapse. Motivated by these limitations, we formulate hypothesis search as a sampling problem, where the objective is to efficiently produce diverse, high-quality hypotheses under a fixed validation budget. Building on this perspective, we propose \ours, an evolutionary framework inspired by the classical parallel tempering algorithm that searches hypotheses at multiple temperature levels and enables principled information exchange across temperatures to improve exploration without disrupting convergence. Across domains including molecular discovery, equation discovery, and algorithm discovery, our approach consistently improves both hypothesis quality and diversity under the same validation budget, and produces candidates that remain robust under more expensive downstream computational validations.

Problem

Research questions and friction points this paper is trying to address.

scientific hypothesis generation

diversity collapse

validation budget

exploration vs. optimization

hypothesis search

Innovation

Methods, ideas, or system contributions that make the work stand out.

hypothesis generation

diversity-aware search

parallel tempering