🤖 AI Summary
Supervisory signals for large language model (LLM) reasoning—particularly chain-of-thought (CoT) annotations—are unreliable due to high human annotation costs and error-prone, unverifiable self-generated reasoning traces.
Method: This paper proposes Code2CoT, a novel paradigm that automatically constructs verifiable, stepwise CoT supervision data from program execution traces. It leverages symbolic execution to extract deterministic execution paths, then integrates structured code-to-natural-language translation with CoT distillation—enabling fully automated, high-fidelity, and scalable supervision generation without human annotation.
Contribution/Results: Code2CoT significantly improves LLM generalization across mathematical reasoning, symbolic deduction, and multi-hop question answering benchmarks, while reducing inference token consumption by mitigating redundant reasoning and repetitive generation. Its core innovation lies in grounding CoT supervision construction in the intrinsic determinism and verifiability of program execution—a first in the field.
📝 Abstract
Training large language models (LLMs) with chain-of-thought (CoT) supervision has proven effective for enhancing their reasoning abilities. However, obtaining reliable and accurate reasoning supervision remains a significant challenge. We propose a scalable method for generating a high-quality CoT supervision dataset by leveraging the determinism of program execution. Unlike existing reasoning dataset generation methods that rely on costly human annotations or error-prone LLM-generated CoT, our approach extracts verifiable, step-by-step reasoning traces from code execution and transforms them into a natural language CoT reasoning. Experiments on reasoning benchmarks across various domains show that our method effectively equips LLMs with transferable reasoning abilities across diverse tasks. Furthermore, the ablation studies validate that our method produces highly accurate reasoning data and reduces overall token length during inference by reducing meaningless repetition and overthinking.