π€ AI Summary
Existing multi-step reasoning data synthesis methods based on rejection sampling suffer from low efficiency and imbalanced coverage across problem difficulty levels. This paper proposes FastMCTSβthe first lightweight strategy that integrates Monte Carlo Tree Search (MCTS) into synthetic data generation. It achieves balanced, high-efficiency synthesis across difficulty levels via step-level reward modeling and adaptive trajectory sampling. Its core innovations include: (i) leveraging learnable step-level evaluation signals to guide search, eliminating redundant independent sampling; and (ii) dynamically balancing exploration and exploitation to improve coverage of high-quality reasoning paths. Experiments demonstrate that, under identical data budgets, FastMCTS increases the yield of correct reasoning paths by over 30%. Consequently, downstream models fine-tuned on FastMCTS-generated data achieve an average performance gain of 3.9% across multilingual and multitask benchmarks.
π Abstract
Synthetic high-quality multi-step reasoning data can significantly enhance the performance of large language models on various tasks. However, most existing methods rely on rejection sampling, which generates trajectories independently and suffers from inefficiency and imbalanced sampling across problems of varying difficulty. In this work, we introduce FastMCTS, an innovative data synthesis strategy inspired by Monte Carlo Tree Search. FastMCTS provides a more efficient sampling method for multi-step reasoning data, offering step-level evaluation signals and promoting balanced sampling across problems of different difficulty levels. Experiments on both English and Chinese reasoning datasets demonstrate that FastMCTS generates over 30% more correct reasoning paths compared to rejection sampling as the number of generated tokens scales up. Furthermore, under comparable synthetic data budgets, models trained on FastMCTS-generated data outperform those trained on rejection sampling data by 3.9% across multiple benchmarks. As a lightweight sampling strategy, FastMCTS offers a practical and efficient alternative for synthesizing high-quality reasoning data. Our code will be released soon.