Exploiting LLMs for Automatic Hypothesis Assessment via a Logit-Based Calibrated Prior

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automated hypothesis generation has created a bottleneck in hypothesis evaluation, particularly in identifying novel, exploratory correlation hypotheses among vast numbers of statistical relationships. Method: We propose the first relevance-oriented automatic hypothesis evaluation framework, centered on a novel Logit-based Calibrated Prior (LCP). LCP leverages raw logits from large language models (LLMs) to construct calibrated, continuous prior distributions over variable-pair-level correlation strengths—enabling zero-shot generalization and context-sensitive inference without fine-tuning. Contribution/Results: Evaluated on 2,096 real-world variable pairs, LCP achieves 78.8% sign accuracy, 0.26 mean absolute error, and 89.2% coverage of 95% credible intervals—significantly outperforming fine-tuned RoBERTa. It demonstrates consistent superiority in both binary correlation classification and hypothesis ranking tasks, establishing a new state-of-the-art for unsupervised, LLM-driven correlation assessment.

Technology Category

Application Category

📝 Abstract
As hypothesis generation becomes increasingly automated, a new bottleneck has emerged: hypothesis assessment. Modern systems can surface thousands of statistical relationships-correlations, trends, causal links-but offer little guidance on which ones are novel, non-trivial, or worthy of expert attention. In this work, we study the complementary problem to hypothesis generation: automatic hypothesis assessment. Specifically, we ask: given a large set of statistical relationships, can we automatically assess which ones are novel and worth further exploration? We focus on correlations as they are a common entry point in exploratory data analysis that often serve as the basis for forming deeper scientific or causal hypotheses. To support automatic assessment, we propose to leverage the vast knowledge encoded in LLMs' weights to derive a prior distribution over the correlation value of a variable pair. If an LLM's prior expects the correlation value observed, then such correlation is not surprising, and vice versa. We propose the Logit-based Calibrated Prior, an LLM-elicited correlation prior that transforms the model's raw output logits into a calibrated, continuous predictive distribution over correlation values. We evaluate the prior on a benchmark of 2,096 real-world variable pairs and it achieves a sign accuracy of 78.8%, a mean absolute error of 0.26, and 95% credible interval coverage of 89.2% in predicting Pearson correlation coefficient. It also outperforms a fine-tuned RoBERTa classifier in binary correlation prediction and achieves higher precision@K in hypothesis ranking. We further show that the prior generalizes to correlations not seen during LLM pretraining, reflecting context-sensitive reasoning rather than memorization.
Problem

Research questions and friction points this paper is trying to address.

Automatically assess novelty of statistical relationships
Leverage LLMs to predict correlation values accurately
Rank hypotheses for scientific exploration efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging LLMs for automatic hypothesis assessment
Logit-based calibrated prior for correlation prediction
Transforming LLM logits into predictive distributions
🔎 Similar Papers
No similar papers found.
Y
Yue Gong
Department of Computer Science, The University of Chicago
Raul Castro Fernandez
Raul Castro Fernandez
The University of Chicago
DataSystems