CC-LEARN: Cohort-based Consistency Learning

📅 2025-06-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the inconsistency and insufficient robustness of large language models (LLMs) in reasoning, this paper proposes a queue-level consistency reinforcement learning framework. It generates semantically similar question queues via programmatic abstraction and jointly optimizes queue-level accuracy, retrieval-augmented gain, and rejection penalty to enforce uniform reasoning paths across questions within each queue. We introduce the first queue-level consistency modeling paradigm and design an end-to-end differentiable multi-objective composite reward function, overcoming the limitation of supervised fine-tuning in capturing cross-sample reasoning constraints. Built upon the PPO algorithm and retrieval-augmented reasoning, our method achieves significant improvements in both accuracy and reasoning stability on ARC-Challenge and StrategyQA, consistently outperforming pretrained and supervised fine-tuning baselines.

Technology Category

Application Category

📝 Abstract
Large language models excel at many tasks but still struggle with consistent, robust reasoning. We introduce Cohort-based Consistency Learning (CC-Learn), a reinforcement learning framework that improves the reliability of LLM reasoning by training on cohorts of similar questions derived from shared programmatic abstractions. To enforce cohort-level consistency, we define a composite objective combining cohort accuracy, a retrieval bonus for effective problem decomposition, and a rejection penalty for trivial or invalid lookups that reinforcement learning can directly optimize, unlike supervised fine-tuning. Optimizing this reward guides the model to adopt uniform reasoning patterns across all cohort members. Experiments on challenging reasoning benchmarks (including ARC-Challenge and StrategyQA) show that CC-Learn boosts both accuracy and reasoning stability over pretrained and SFT baselines. These results demonstrate that cohort-level RL effectively enhances reasoning consistency in LLMs.
Problem

Research questions and friction points this paper is trying to address.

Improves LLM reasoning consistency via cohort-based learning
Enhances reliability using reinforcement learning on similar questions
Boosts accuracy and stability in challenging reasoning benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning for consistent reasoning
Cohort-based training with shared abstractions
Composite objective optimizing accuracy and stability
🔎 Similar Papers
No similar papers found.