🤖 AI Summary
Existing large language models suffer from computational redundancy in multi-step reasoning: they over-deploy tree search for simple tasks and fail to identify semantically equivalent reasoning paths, leading to redundant computation and reduced efficiency. This paper proposes a semantic-aware adaptive gating exploration framework that (1) enables task-difficulty-driven dynamic activation and termination of tree search—the first such mechanism—and (2) introduces semantic consistency aggregation within tree search to automatically detect and merge semantically equivalent paths. The method comprises three core components: an adaptive confidence gating module, a lightweight prefix reasoner, and a plug-and-play tree search architecture. Evaluated on GSM8K and ARC benchmarks, our approach achieves an average accuracy improvement of 4.3% over strong baselines while reducing computational overhead to just 31% of mainstream tree search methods. It is fully compatible with prevalent open-weight models, including Llama-2/3 and Mistral.
📝 Abstract
Recent advancements in large language models (LLMs) have shown remarkable potential in various complex tasks requiring multi-step reasoning methods like tree search to explore diverse reasoning paths. However, existing methods often suffer from computational inefficiency and redundancy. First, they overlook the diversity of task difficulties, leading to unnecessarily extensive searches even for easy tasks. Second, they neglect the semantics of reasoning paths, resulting in redundant exploration of semantically identical paths. To address these limitations, we propose Semantic Exploration with Adaptive Gating (SEAG), a computationally efficient method. SEAG employs an adaptive gating mechanism that dynamically decides whether to conduct a tree search, based on the confidence level of answers from a preceding simple reasoning method. Furthermore, its tree-based exploration consolidates semantically identical reasoning steps, reducing redundant explorations while maintaining or even improving accuracy. Our extensive experiments demonstrate that SEAG significantly improves accuracy by 4.3% on average while requiring only 31% of computational costs compared to existing tree search-based methods on complex reasoning benchmarks including GSM8K and ARC with diverse language models such as Llama2, Llama3, and Mistral.