Capability Self-Assessment: Teaching LLMs to Know Their Limits

๐Ÿ“… 2026-05-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

181K/year
๐Ÿค– AI Summary
This work addresses the tendency of large language models to overestimate their capabilities and struggle with accurately assessing task solvability. It introduces, for the first time, capability self-assessment (CSA) as a policy learning problem, training models via reinforcement learning to dynamically decide whether to answer a query themselves or delegate it elsewhere. This approach preserves the modelโ€™s original competencies while significantly improving the accuracy of self-assessment. Compared to supervised fine-tuning, reinforcement learning yields markedly better CSA performance and demonstrates strong out-of-distribution generalization. The proposed mechanism effectively enhances decision-making in localโ€“cloud collaborative inference and optimizes the selection of training data.
๐Ÿ“ Abstract
The ability to recognize one's own limitations and decide whether to solve a problem or delegate is fundamental for reliable intelligent systems. Yet we show that modern large language models systematically lack this ability: across diverse model families and scales, they overestimate their competence and attempt queries they cannot solve. We refer to this ability as Capability Self-Assessment (CSA) and formulate it as a policy-learning problem, aiming to improve self-assessment while preserving the model's original capabilities. Our results show that reinforcement learning teaches CSA effectively, significantly outperforming supervised fine-tuning while preserving original capabilities. In contrast, supervised fine-tuning severely degrades the capabilities the model is meant to assess. Moreover, learned self-assessment behavior generalizes well out of distribution, suggesting that CSA is a transferable model trait. Finally, CSA is practically useful: it improves local-cloud decision making at inference time and provides a signal for targeted data selection during training.
Problem

Research questions and friction points this paper is trying to address.

Capability Self-Assessment
Large Language Models
Self-Awareness
Overconfidence
Delegation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Capability Self-Assessment
Reinforcement Learning
Out-of-Distribution Generalization
Local-Cloud Decision Making
Policy Learning
๐Ÿ”Ž Similar Papers
No similar papers found.