Accelerating Training Speed of Tiny Recursive Models via Curriculum Guided Adaptive Recursion

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Training small recursive reasoning models incurs high computational costs (e.g., 36 GPU-hours per dataset). Method: This work introduces curriculum learning along the recursive depth dimension—the first such approach—proposing a training paradigm based on progressive depth growth and hierarchical supervision weighting, augmented by exponentially decaying loss weights and dynamic recursive depth adjustment. Unlike prior methods, it avoids data reordering and instead systematically modulates the complexity of the model’s recursive structure. Contribution/Results: The method achieves Pareto-optimal improvements in training efficiency and performance. On Sudoku-Extreme, training time is reduced by 42% (from 10.93 to 6.38 GPU-hours), with only a marginal accuracy drop of 0.63%. Inference halting accuracy reaches 100%, and average inference steps decrease by 11%. This work establishes a novel, efficient training paradigm for small recursive models.

Technology Category

Application Category

📝 Abstract
Recursive reasoning models achieve remarkable performance on complex reasoning tasks through iterative refinement, enabling tiny networks to match large language models thousands of times their size. However, training remains computationally expensive, prior work reporting approximately 36 GPU-hours per dataset, limiting broader adoption and research. We propose CGAR, a novel training methodology that applies curriculum learning to architectural depth rather than traditional data ordering. CGAR introduces two synergistic components: Progressive Depth Curriculum dynamically adjusts recursion depth from shallow to deep configurations during training, preventing early overfitting while reducing computational cost, and Hierarchical Supervision Weighting applies exponentially decaying importance to supervision steps, aligning loss weighting with observed gradient magnitude decay. On Sudoku-Extreme with 423,168 test puzzles, CGAR achieves 1.71x training speedup (10.93 to 6.38 hours, 42% cost reduction) with only 0.63% accuracy drop (86.65% to 86.02%). Systematic ablations reveal Progressive Depth Curriculum alone achieves 2.26x speedup with 85.47% accuracy, demonstrating a rare Pareto improvement where architectural curriculum simultaneously enhances training efficiency and solution quality. CGAR-trained models exhibit superior inference efficiency with 100% halting accuracy and 11% fewer reasoning steps. Our work demonstrates that principled curriculum on architectural depth enables efficient training of recursive reasoning models on modest hardware. Code and models: https://github.com/Kaleemullahqasim/CGAR and https://huggingface.co/Kaleemullah/trm-cgar-sudoku
Problem

Research questions and friction points this paper is trying to address.

Accelerates training speed of recursive reasoning models
Reduces computational costs while maintaining model accuracy
Enables efficient training on modest hardware through curriculum learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum learning applied to architectural depth
Progressive Depth Curriculum adjusts recursion depth dynamically
Hierarchical Supervision Weighting decays supervision step importance
🔎 Similar Papers
No similar papers found.
Kaleem Ullah Qasim
Kaleem Ullah Qasim
School of Computing and Artificial Intelligence, Southwest Jiaotong University
Reasoning in LLMsPrompt EngineeringLLM Agents
J
Jiashu Zhang
School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, Sichuan, China