Enhancing LLM Metacognition via Cognitive Pairwise Training

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses a critical limitation in existing approaches to enhancing the reasoning reliability of large language models, which often encourage superficial refusal behaviors rather than genuine discrimination of reasoning quality. To overcome this, the authors propose Cognitive Pairwise Training (CPT), a novel framework that, for the first time, converts pairwise comparisons of reasoning trajectories into reusable alignment signals to guide the model during intermediate training stages toward internalizing a robust boundary between reliable and unreliable reasoning. By integrating supervised fine-tuning with reinforcement learning, CPT substantially improves the trade-off between reasoning accuracy and metacognitive capabilities—such as strategic abstention. Evaluated on a 14B-parameter model, CPT+RL yields a +2.2 gain in average mathematics score and a +5.2 improvement in abstention-F1 over standard SFT+RL, demonstrating strong robustness and scalability across diverse model sizes and architectures.

📝 Abstract

Reinforcement learning with verifiable rewards (RLVR) has become central to LLM reasoning, but its outcome-level rewards can make models more willing to give confident answers when evidence or reasoning is unreliable. Existing SFT or RL methods mainly teach LLMs to refuse or express uncertainty at the response level, which can overfit abstention behavior rather than improve reasoning reliability. To address this limitation, we propose Cognitive Pairwise Training (CPT), a cognitive mid-training alignment stage that turns pairwise comparisons over reasoning traces into a reusable alignment signal. By learning to distinguish trustworthy from flawed reasoning, CPT encourages the model to internalize a reasoning-quality discrimination boundary rather than memorize surface refusal patterns. Across five model scales and three model families, CPT improves the reasoning--metacognition trade-off. At 14B, CPT+RL outperforms the standard SFT+RL pipeline by +2.2 math-average points and +5.2 abstention-F1 points. Further analyses show that CPT improves trace quality and exhibits strong robustness and scalability across evaluation and training settings. Code and models are released at https://github.com/Tsinghua-dhy/CPT.

Problem

Research questions and friction points this paper is trying to address.

metacognition

reasoning reliability

abstention behavior

reinforcement learning

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cognitive Pairwise Training

metacognition

reasoning alignment