CaliDist: Calibrating Large Language Models via Behavioral Robustness to Distraction

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Existing calibration methods for large language models often overlook the models’ behavioral robustness when confronted with irrelevant or misleading information, leading to unreliable confidence estimates. This work proposes CaliDist, a novel post-hoc calibration approach that uniquely leverages behavioral robust日消息 as a calibration signal: by injecting semantic distractors into the input and observing the resulting shifts in predictions and uncertainty, CaliDist adaptively rescales the model’s initial confidence scores. Extensive experiments across seven natural language understanding benchmarks and six prominent large language models demonstrate that CaliDist reduces the average Expected Calibration Error (ECE) from 23% to 7%, achieving a relative improvement of 70% and substantially outperforming strong existing baselines.

📝 Abstract

Existing calibration methods for Large Language Models (LLMs) often overlook a critical dimension of trustworthiness: a model's {\em behavioral robustness} to irrelevant or misleading information. In this paper, we argue that a model's true confidence should reflect its stability under cognitive pressure. We introduce \textsc{CaliDist}, a novel post-hoc calibration approach that directly measures and penalizes a model's susceptibility to distraction. \textsc{CaliDist} quantifies how an LLM's predictions and uncertainty change when its input prompt is perturbed with semantic \textit{distractors}. This stability (or lack thereof) signal is then used to adaptively scale the model's initial confidence score. Our extensive experiments on seven Natural Language Understanding classification benchmarks using six distinct LLMs show that \textsc{CaliDist} consistently achieves lower Expected Calibration Error (ECE) and Brier Score compared with strong baselines. Remarkably, our method reduces the ECE from 23\% to 7\% on average--a relative improvement of 70\%--demonstrating that behavioral stability is a powerful signal for calibration. We make our code and datasets available at github.com/m-anas-j/CaliDist.

Problem

Research questions and friction points this paper is trying to address.

calibration

large language models

behavioral robustness

distraction

trustworthiness

Innovation

Methods, ideas, or system contributions that make the work stand out.

behavioral robustness

calibration

large language models