Reasoning Pattern Matters: Learning to Reason without Human Rationales

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

High-quality human annotation of reasoning traces for patterned reasoning tasks is prohibitively expensive. Method: This paper proposes PARO, the first framework to identify reasoning *patterns*—rather than annotation scale or quality—as the primary determinant of model reasoning performance. PARO introduces a fully automated, human-annotation-free approach for generating task-specific reasoning paths, integrating supervised fine-tuning (SFT) with verifiability-aware reward-based reinforcement learning (RLVR) and a novel pattern-aware mechanism that guides large language models to produce structurally and semantically aligned reasoning traces. Contribution/Results: On numerical semantic matching tasks, PARO-generated reasoning paths achieve performance comparable to those trained on ten times more human-annotated data, drastically reducing annotation costs. PARO establishes a new paradigm for efficient, low-resource reasoning modeling by decoupling high-performance inference from costly manual annotation.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities under the widely adopted SFT+RLVR paradigm, which first performs Supervised Fine-Tuning (SFT) on human-annotated reasoning trajectories (rationales) to establish initial reasoning behaviors, then applies Reinforcement Learning with Verifiable Rewards (RLVR) to optimize the model using verifiable signals without golden rationales. However, annotating high-quality rationales for the SFT stage remains prohibitively expensive. This paper investigates when and how rationale annotation costs can be substantially reduced without compromising reasoning performance. We identify a broad class of problems, termed patterned reasoning tasks, where reasoning follows a fixed, procedural strategy consistent across instances. Although instances vary in content such as domain knowledge, factual information, or numeric values, the solution derives from applying a shared reasoning pattern. We argue that the success of SFT+RLVR on such tasks primarily stems from its ability to enable models to internalize these reasoning patterns. Using numerical semantic matching as a representative task, we provide both causal and behavioral evidence showing that reasoning patterns rather than the quantity or quality of rationales are the key determinant of performance. Building on these insights, we propose Pattern-Aware LLMs as Rationale AnnOtators (PARO), a simple yet effective framework that enables LLMs to generate rationales aligned with task-specific reasoning patterns without requiring human rationale annotations. Experiments show that PARO-generated rationales achieve comparable SFT+RLVR performance to human rationales that are 10 times larger. These results suggest that large-scale human rationale annotations can be replaced with LLM-based automatic annotations requiring only limited human supervision over reasoning patterns.

Problem

Research questions and friction points this paper is trying to address.

Reducing costly human rationale annotations for LLM reasoning tasks

Identifying patterned reasoning tasks with consistent procedural strategies

Developing automated rationale generation aligned with reasoning patterns

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pattern-aware LLMs generate reasoning rationales automatically

Replaces human annotations with model-generated procedural patterns

Reduces rationale costs while maintaining reasoning performance

🔎 Similar Papers

No similar papers found.

Authors to Follow