Reasoning Pattern Matters: Learning to Reason without Human Rationales

๐Ÿ“… 2025-10-14
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
High-quality human annotation of reasoning traces for patterned reasoning tasks is prohibitively expensive. Method: This paper proposes PARO, the first framework to identify reasoning *patterns*โ€”rather than annotation scale or qualityโ€”as the primary determinant of model reasoning performance. PARO introduces a fully automated, human-annotation-free approach for generating task-specific reasoning paths, integrating supervised fine-tuning (SFT) with verifiability-aware reward-based reinforcement learning (RLVR) and a novel pattern-aware mechanism that guides large language models to produce structurally and semantically aligned reasoning traces. Contribution/Results: On numerical semantic matching tasks, PARO-generated reasoning paths achieve performance comparable to those trained on ten times more human-annotated data, drastically reducing annotation costs. PARO establishes a new paradigm for efficient, low-resource reasoning modeling by decoupling high-performance inference from costly manual annotation.

Technology Category

Application Category

๐Ÿ“ Abstract
Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities under the widely adopted SFT+RLVR paradigm, which first performs Supervised Fine-Tuning (SFT) on human-annotated reasoning trajectories (rationales) to establish initial reasoning behaviors, then applies Reinforcement Learning with Verifiable Rewards (RLVR) to optimize the model using verifiable signals without golden rationales. However, annotating high-quality rationales for the SFT stage remains prohibitively expensive. This paper investigates when and how rationale annotation costs can be substantially reduced without compromising reasoning performance. We identify a broad class of problems, termed patterned reasoning tasks, where reasoning follows a fixed, procedural strategy consistent across instances. Although instances vary in content such as domain knowledge, factual information, or numeric values, the solution derives from applying a shared reasoning pattern. We argue that the success of SFT+RLVR on such tasks primarily stems from its ability to enable models to internalize these reasoning patterns. Using numerical semantic matching as a representative task, we provide both causal and behavioral evidence showing that reasoning patterns rather than the quantity or quality of rationales are the key determinant of performance. Building on these insights, we propose Pattern-Aware LLMs as Rationale AnnOtators (PARO), a simple yet effective framework that enables LLMs to generate rationales aligned with task-specific reasoning patterns without requiring human rationale annotations. Experiments show that PARO-generated rationales achieve comparable SFT+RLVR performance to human rationales that are 10 times larger. These results suggest that large-scale human rationale annotations can be replaced with LLM-based automatic annotations requiring only limited human supervision over reasoning patterns.
Problem

Research questions and friction points this paper is trying to address.

Reducing costly human rationale annotations for LLM reasoning tasks
Identifying patterned reasoning tasks with consistent procedural strategies
Developing automated rationale generation aligned with reasoning patterns
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pattern-aware LLMs generate reasoning rationales automatically
Replaces human annotations with model-generated procedural patterns
Reduces rationale costs while maintaining reasoning performance
๐Ÿ”Ž Similar Papers
No similar papers found.
C
Chaoxu Pang
Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences.
Yixuan Cao
Yixuan Cao
Shenzhen University
Software EngineeringSecurityKernel & CompilerTesting & VerificationBig Data
Ping Luo
Ping Luo
National University of Defense Technology
distributed_computing