🤖 AI Summary
Existing large language models (LLMs) exhibit limited reliability in mathematical computation verification. To address this, we propose a three-stage metacognitive dynamic concept tree framework: (1) automatic construction of an interpretable hierarchical concept structure; (2) generation of locally accuracy-verified sub-computations; and (3) majority-voting-based evaluation and selection of the optimal solution. This framework introduces the first metacognition-driven dynamic tree architecture, enabling automated solution-space pruning and trustworthy verification without hand-crafted prompts—establishing “metacognitive verification” as a novel paradigm for mathematical reasoning. Compatible with diverse LLM backbones, our method achieves 58.1%, 86.6%, and 85.0% accuracy on CHAMP, MATH, and Game-of-24, respectively—significantly outperforming Graph-of-Thought (GoT) and Tree-of-Thought (ToT), with improvements of up to 7.6 percentage points.
📝 Abstract
Despite advances in mathematical reasoning capabilities, Large Language Models (LLMs) still struggle with calculation verification when using established prompting techniques. We present MDToC (Metacognitive Dynamic Tree of Concepts), a three-phase approach that constructs a concept tree, develops accuracy-verified calculations for each concept, and employs majority voting to evaluate competing solutions. Evaluations across CHAMP, MATH, and Game-of-24 benchmarks demonstrate our MDToC's effectiveness, with GPT-4-Turbo achieving 58.1% on CHAMP, 86.6% on MATH, and 85% on Game-of-24 - outperforming GoT by 5%, 5.4%, and 4% on all these tasks, respectively, without hand-engineered hints. MDToC consistently surpasses existing prompting methods across all backbone models, yielding improvements of up to 7.6% over ToT and 6.2% over GoT, establishing metacognitive calculation verification as a promising direction for enhanced mathematical reasoning.