FastMCTS: A Simple Sampling Strategy for Data Synthesis

📅 2025-02-17

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Existing multi-step reasoning data synthesis methods based on rejection sampling suffer from low efficiency and imbalanced coverage across problem difficulty levels. This paper proposes FastMCTS—the first lightweight strategy that integrates Monte Carlo Tree Search (MCTS) into synthetic data generation. It achieves balanced, high-efficiency synthesis across difficulty levels via step-level reward modeling and adaptive trajectory sampling. Its core innovations include: (i) leveraging learnable step-level evaluation signals to guide search, eliminating redundant independent sampling; and (ii) dynamically balancing exploration and exploitation to improve coverage of high-quality reasoning paths. Experiments demonstrate that, under identical data budgets, FastMCTS increases the yield of correct reasoning paths by over 30%. Consequently, downstream models fine-tuned on FastMCTS-generated data achieve an average performance gain of 3.9% across multilingual and multitask benchmarks.

Technology Category

Application Category

📝 Abstract

Synthetic high-quality multi-step reasoning data can significantly enhance the performance of large language models on various tasks. However, most existing methods rely on rejection sampling, which generates trajectories independently and suffers from inefficiency and imbalanced sampling across problems of varying difficulty. In this work, we introduce FastMCTS, an innovative data synthesis strategy inspired by Monte Carlo Tree Search. FastMCTS provides a more efficient sampling method for multi-step reasoning data, offering step-level evaluation signals and promoting balanced sampling across problems of different difficulty levels. Experiments on both English and Chinese reasoning datasets demonstrate that FastMCTS generates over 30% more correct reasoning paths compared to rejection sampling as the number of generated tokens scales up. Furthermore, under comparable synthetic data budgets, models trained on FastMCTS-generated data outperform those trained on rejection sampling data by 3.9% across multiple benchmarks. As a lightweight sampling strategy, FastMCTS offers a practical and efficient alternative for synthesizing high-quality reasoning data. Our code will be released soon.

Problem

Research questions and friction points this paper is trying to address.

Enhances multi-step reasoning data synthesis

Improves sampling efficiency and balance

Boosts language model performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Monte Carlo Tree Search

Balanced sampling strategy

Step-level evaluation signals

🔎 Similar Papers

A Survey on Future Frame Synthesis: Bridging Deterministic and Generative Approaches