Neural Sabermetrics with World Model: Play-by-play Predictive Modeling with Large Language Model

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes the first application of large language models (LLMs) as world models in sports analytics, addressing the longstanding challenge of multi-step dynamic forecasting in baseball. Leveraging over a decade of high-fidelity MLB tracking data—comprising more than 7 million pitch sequences and approximately 3 billion tokens—the authors develop an autoregressive, ball-by-ball generative model via continued pretraining. This model unifies the multidimensional evolution of gameplay within a single framework and substantially outperforms existing neural baselines: it achieves 64% accuracy in predicting the outcome of the next pitch and 78% accuracy in forecasting batter swing decisions across both regular-season and postseason games. By enabling forward-looking predictions rather than retrospective analysis alone, the approach transcends the limitations of traditional baseball metrics.

Technology Category

Application Category

📝 Abstract
Classical sabermetrics has profoundly shaped baseball analytics by summarizing long histories of play into compact statistics. While these metrics are invaluable for valuation and retrospective analysis, they do not define a generative model of how baseball games unfold pitch by pitch, leaving most existing approaches limited to single-step prediction or post-hoc analysis. In this work, we present Neural Sabermetrics with World Model, a Large Language Model (LLM) based play-by-play world model for baseball. We cast baseball games as long auto-regressive sequences of events and continuously pretrain a single LLM on more than ten years of Major League Baseball (MLB) tracking data, comprising over seven million pitch sequences and approximately three billion tokens. The resulting model is capable of predicting multiple aspects of game evolution within a unified framework. We evaluate our model on both in-distribution regular-season data and out-of-distribution postseason games and compare against strong neural baselines from prior work. Despite using a single backbone model, our approach outperforms the performance of existing baselines, (1) correctly predicting approximately 64% of next pitches within a plate appearance and (2) 78% of batter swing decisions, suggesting that LLMs can serve as effective world models for sports.
Problem

Research questions and friction points this paper is trying to address.

sabermetrics
play-by-play prediction
world model
baseball analytics
generative modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Model
World Model
Play-by-play Prediction
Neural Sabermetrics
Autoregressive Sequence Modeling
🔎 Similar Papers
No similar papers found.
Y
Young Jin Ahn
Carnegie Mellon University
Yiyang Du
Yiyang Du
Carnegie Mellon University
Multimodal NLP
Zheyuan Zhang
Zheyuan Zhang
Carnegie Mellon University
NLPHuman-AI Interaction
H
Haisen Kang
Carnegie Mellon University