Iterative Deepening Sampling for Large Language Models

📅 2025-02-08

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

To address the limited self-assessment and self-correction capabilities of large language models (LLMs) in complex reasoning tasks, this paper proposes the Iterative Deepening Sampling (IDS) framework. IDS employs a human-triggered mechanism to jointly optimize intra-response and inter-response self-reflection. The method integrates heuristic sampling control, response re-scoring, multi-round self-verification, and backtracking-based correction, while jointly enforcing deterministic decoding constraints and uncertainty-aware resampling. Crucially, IDS operates without reliance on high-quality human feedback or reinforcement learning signals. Experimental results demonstrate substantial improvements in problem-solving success rates on the Math500 and AIME benchmarks, particularly for challenging problems. Ablation studies quantitatively validate the individual contributions of each component to reasoning depth and answer robustness.

Technology Category

Application Category

📝 Abstract

The recent release of OpenAI's o1 models and other similar frameworks showcasing test-time scaling laws has demonstrated their exceptional capability to tackle complex reasoning tasks. Inspired by this, subsequent research has revealed that such test-time scaling laws hinge on the model's ability to search both within a single response (intra-response) and across multiple responses (inter-response) during training. Crucially, beyond selecting a single optimal response, the model must also develop robust self-correction capabilities within its own outputs. However, training models to achieve effective self-evaluation and self-correction remains a significant challenge, heavily dependent on the quality of self-reflection data. In this paper, we address this challenge by focusing on enhancing the quality of self-reflection data generation for complex problem-solving, which can subsequently improve the training of next-generation large language models (LLMs). Specifically, we explore how manually triggering a model's self-correction mechanisms can improve performance on challenging reasoning tasks. To this end, we propose a novel iterative deepening sampling algorithm framework designed to enhance self-correction and generate higher-quality samples. Through extensive experiments on Math500 and AIME benchmarks, we demonstrate that our method achieves a higher success rate on difficult tasks and provide detailed ablation studies to analyze its effectiveness across diverse settings.

Problem

Research questions and friction points this paper is trying to address.

Enhancing self-reflection data quality

Improving self-correction in LLMs

Boosting complex reasoning task performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative deepening sampling algorithm

Enhances self-correction mechanisms

Improves quality of self-reflection data

🔎 Similar Papers

No similar papers found.