🤖 AI Summary
This work addresses the lack of systematic investigation into the mechanisms underlying the role of reasoning data in post-training stages, a gap that has led to fragmented and often irreproducible findings. It presents the first structured survey of this domain, synthesizing over 150 publicly available studies and system reports. Centered on four core dimensions—formats of reasoning data, sources of effectiveness, construction methodologies, and scaling behaviors—the study proposes a unified attribution framework. Through comprehensive literature review, taxonomic categorization, and cross-study comparison, it integrates diverse advances spanning datasets, reinforcement learning strategies, reward models, and benchmarking protocols. The resulting knowledge体系 offers a traceable and actionable foundation for reasoning data utilization, thereby informing future data release practices and training pipeline design.
📝 Abstract
Post-training has become a primary driver of recent progress in large reasoning models, and reasoning data are often the key variable determining whether this stage succeeds. Work on post-training reasoning data has grown rapidly, yet this literature remains scattered across dataset papers, reinforcement-learning recipes, reward-model studies, benchmarks, and frontier system reports. This paper is the first primer to synthesize over 150 key public studies and system reports on post-training reasoning data. We organize the field around four questions: what data objects exist, what makes them useful, how they are constructed, and how they scale. Together, this organization provides an attribution framework for future reasoning-data releases and post-training recipes.