π€ AI Summary
Existing methods for synthesizing reasoning data struggle to effectively verify the correctness of intermediate steps in natural language multi-hop reasoning, particularly lacking reliable validation mechanisms when context is ambiguous or incomplete. This work proposes a structured synthetic data generation framework that, for the first time, integrates syllogistic reasoning principles into this task. By combining the generative capabilities of large language models with the supervisory power of symbolic reasoning engines, the approach employs a unified prompting template to produce modular reasoning chains and performs fine-grained symbolic verification on each step. This method overcomes limitations of prior approaches that rely solely on final answers or are confined to code- or structure-based tasks, achieving significant improvements over strong baselines across six benchmarks spanning logical, factual, and commonsense reasoning, thereby enhancing modelsβ multi-step reasoning capabilities.
π Abstract
Training large language models (LLMs) with synthetic reasoning data has become a popular approach to enhancing their reasoning capabilities, while a key factor influencing the effectiveness of this paradigm is the quality of the generated multi-step reasoning data. To generate high-quality reasoning data, many recent methods generate synthetic reasoning paths and filter them based on final answer correctness, often overlooking flaws in intermediate reasoning steps. To enhance the verification of intermediate reasoning steps, prior work primarily resorts to code execution or symbolic reasoning engines. However, code-based validation is restricted to code or mathematical tasks, and reasoning engines require a well-structured and complete context. As a result, existing methods fail to function effectively in natural language reasoning tasks that involve ambiguous or incomplete contexts. In these tasks, synthetic data still lack reliable checks for verifying each reasoning step. To address this challenge, we introduce ORACLE, a structured data generation framework inspired by syllogistic reasoning. ORACLE integrates the generative strengths of LLMs with symbolic supervision: the LLM produces step-wise reasoning contexts, while a symbolic reasoning engine verifies the validity of each intermediate step. By employing a unified prompting template to elicit modular reasoning chains, ORACLE enables fine-grained, step-level validation, facilitating the construction of high-quality multi-step reasoning data. Across six logical, factual, and commonsense reasoning benchmarks, our ORACLE consistently outperforms strong baselines on multiple models.