The ACUTE Protocol: Operationalizing Language Model Activations for Better Calibration, Utility, and Trust

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language models commonly exhibit overconfidence and poor calibration, and existing calibration methods are often vulnerable to uninformative strategies—such as naive baseline predictions—struggling to balance reliability with utility. This work proposes ACUTE, a protocol that leverages internal activation signals of language models to deliver a general-purpose, sample-efficient, and computationally lightweight approach to confidence estimation. ACUTE is applicable across diverse tasks including multiple-choice question answering, tool calling, and scientific summarization, and introduces EURO, a novel metric that jointly evaluates calibration quality and informativeness. Evaluated on six large models spanning four model families, ACUTE consistently outperforms strong baselines, achieving substantially lower calibration error while enhancing both the trustworthiness and practical utility of model predictions.

📝 Abstract

As language models improve and become increasingly deployed to solve a variety of tasks, trustworthiness becomes essential. Calibration is a good proxy for trust: well-calibrated confidence estimates help inform the risk versus reward tradeoff when trusting a specific model output. Unfortunately, even as models improve, they remain poorly calibrated, often biasing towards overconfidence. Additionally, calibration can be gamed: a policy that always predicts the base rate is perfectly calibrated, but completely uninformative. To resolve this, we develop a new metric, expected utility renormalized by the oracle (EURO), that balances calibration and informativeness. We also propose a general-purpose activation-based confidence, utility, and trust estimation protocol (ACUTE) to appropriately adjudicate uncertainty. The ACUTE protocol provides flexible, sample-efficient, and compute-efficient confidence estimators for 3 tasks including multiple choice question answering, tool-calling, and scientific document summarization across 6 models from 4 model families. ACUTE outperforms strong baselines on EURO, while maintaining low calibration error. Taken together, our work shows that equipping LLMs with the ACUTE protocol can improve calibration, utility, and trustworthiness in numerous settings.

Problem

Research questions and friction points this paper is trying to address.

calibration

trustworthiness

overconfidence

informativeness

language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

calibration

trustworthiness

ACUTE protocol