Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning

📅 2024-10-14

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Large language models (LLMs) suffer from low accuracy and frequent logical or arithmetic errors in mathematical reasoning. Method: This paper proposes Chain-of-Self-Correction (CoSC), a mechanism enabling LLMs to autonomously generate executable code, execute it for verification, and iteratively refine outputs. Crucially, self-correction is internalized as an intrinsic reasoning paradigm—eliminating reliance on external models (e.g., GPT-4) for supervision. Training employs a two-stage demonstration-free zero-shot instruction fine-tuning, leveraging both seed data and high-quality self-generated data. Results: The approach achieves 53.5% accuracy on the MATH benchmark using purely textual, open-source LLMs—surpassing multimodal closed-source models including GPT-4, GPT-4V, and Gemini-1.0 for the first time. This establishes a new paradigm for cost-effective, high-fidelity mathematical reasoning.

Technology Category

Application Category

📝 Abstract

Accurate mathematical reasoning with Large Language Models (LLMs) is crucial in revolutionizing domains that heavily rely on such reasoning. However, LLMs often encounter difficulties in certain aspects of mathematical reasoning, leading to flawed reasoning and erroneous results. To mitigate these issues, we introduce a novel mechanism, the Chain of Self-Correction (CoSC), specifically designed to embed self-correction as an inherent ability in LLMs, enabling them to validate and rectify their own results. The CoSC mechanism operates through a sequence of self-correction stages. In each stage, the LLMs generate a program to address a given problem, execute this program using program-based tools to obtain an output, subsequently verify this output. Based on the verification, the LLMs either proceed to the next correction stage or finalize the answer. This iterative self-correction process allows the LLMs to refine its reasoning steps and improve the accuracy of its mathematical reasoning. We implement CoSC using a two-phase fine-tuning approach. First, LLMs are trained with a relatively small volume of seeding data generated from GPT-4. Then, we enhance CoSC by training with a larger volume of self-generated data, without relying on GPT-4. Experiments show that CoSC significantly boosts performance on standard mathematical datasets compared to existing open-source LLMs. Notably, our CoSC-Code-34B model achieved a 53.5% score on the challenging MATH dataset, outperforming models like ChatGPT, GPT-4, and multi-modal LLMs such as GPT-4V and Gemini-1.0. Importantly, CoSC operates in a zero-shot manner without requiring demonstrations.

Problem

Research questions and friction points this paper is trying to address.

Enhance mathematical reasoning in Large Language Models.

Embed self-correction as an inherent ability in LLMs.

Improve accuracy with Chain of Self-Correction mechanism.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain of Self-Correction mechanism

Two-phase fine-tuning approach

Zero-shot operation without demonstrations

🔎 Similar Papers

S3c-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners