LLMs Can Teach Themselves to Better Predict the Future

📅 2025-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of enhancing large language models’ (LLMs) unsupervised future event prediction capability—without relying on human-annotated reasoning examples. We propose a self-supervised framework based on model self-play: it generates diverse, probabilistic reasoning trajectories; automatically ranks and filters high-quality trajectories by measuring the distance between predicted outputs and actual outcomes; and subsequently fine-tunes the model via Direct Preference Optimization (DPO). Our key contribution is the first “outcome-driven reasoning trajectory selection mechanism,” enabling fully label-free, outcome-guided prediction improvement. Evaluated on Phi-4 14B and DeepSeek-R1 14B, our method achieves a 7–10% absolute accuracy gain over strong baselines and random-label DPO controls, matching GPT-4o’s performance on the same task.

Technology Category

Application Category

📝 Abstract
We present an outcome-driven fine-tuning framework that enhances the forecasting capabilities of large language models (LLMs) without relying on human-curated reasoning samples. Our method leverages model self-play to generate pairs of diverse reasoning trajectories and probabilistic forecasts for a set of diverse questions that resolve after the models' knowledge cutoff date. We then rank pairs of these reasoning traces by their distance to the actual outcomes before fine-tuning the model via Direct Preference Optimization (DPO). On a separate test set, our approach increases prediction accuracy of Phi-4 14B and DeepSeek-R1 14B by between 7--10% over a base model and a DPO fine-tuned control model with randomized labels, bringing them on par with forecasting capabilities of much larger frontier models like GPT-4o.
Problem

Research questions and friction points this paper is trying to address.

Enhance LLMs forecasting without human data
Use self-play to generate diverse reasoning pairs
Improve prediction accuracy via Direct Preference Optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-play generates reasoning trajectories
Direct Preference Optimization fine-tuning
enhances LLM forecasting accuracy
🔎 Similar Papers
No similar papers found.
B
Benjamin D. Turtel
Lightning Rod Labs
D
Danny Franklin
Lightning Rod Labs
Philipp Schoenegger
Philipp Schoenegger
Microsoft AI
AI EvaluationsForecastingBehavioural Science