Prompting without Panic: Attribute-aware, Zero-shot, Test-Time Calibration

๐Ÿ“… 2025-06-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Vision-language models (VLMs) often suffer from degraded confidence calibration under test-time prompt tuning (TPT), limiting their deployment in safety-critical applications despite improved recognition accuracy. To address this, we propose an attribute-aware zero-shot test-time calibration method. First, we leverage large language models (LLMs) to extract semantic attribute priors, guiding prompt initialization and mitigating overfitting. Second, we design a regularization loss that enforces intra-class compactness and inter-class separation, enhancing robustness of zero-shot adaptation. Our approach jointly optimizes domain adaptation and calibration in an end-to-end manner. Evaluated across 15 diverse datasets and multiple CLIP variants, it achieves substantial improvements in calibration performance: the average expected calibration error (ECE) drops to 4.11โ€”significantly lower than baseline TPT (11.7), C-TPT (6.12), and PromptAlign (8.43).

Technology Category

Application Category

๐Ÿ“ Abstract
Vision-language models (VLM) have demonstrated impressive performance in image recognition by leveraging self-supervised training on large datasets. Their performance can be further improved by adapting to the test sample using test-time prompt tuning (TPT). Unfortunately, the singular focus of TPT approaches on improving the accuracy suffers from tunnel vision, and leads to degradation in confidence calibration. This limits the applicability of TPT in critical applications. We make three contributions in this work. (1) We posit that random or naive initialization of prompts leads to overfitting on a particular test sample, and is the main reason for miscalibration of the VLM after TPT. To mitigate the problem, we propose careful initialization of test time prompt using prior knowledge about the target label attributes from a large language model (LLM); (2) To further maintain the quality of prompts during pt, we propose a novel regularization loss to reduce intraclass distance, and increase inter-class distance between the learnt Through extensive experiments on different CLIP architectures and 15 datasets, we show that our approach can effectively improve the calibration after TPT. We report an average expected calibration error (ECE) of 4.11 with our method, TCA, compared to 11.7 for vanilla TPT, 6.12 for C-TPT (ICLR'24), 6.78 for DiffTPT (CVPR'23), and 8.43 for PromptAlign (NeurIPS'23). The code is publicly accessible at: https://github.com/rhebbalaguppe/TCA_PromptWithoutPanic.
Problem

Research questions and friction points this paper is trying to address.

Improves calibration in test-time prompt tuning for vision-language models
Addresses overfitting from naive prompt initialization using LLM knowledge
Reduces intraclass distance and increases inter-class distance via regularization
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based prompt initialization for better calibration
Regularization loss to reduce intraclass distance
Improves calibration error significantly over baselines
๐Ÿ”Ž Similar Papers
No similar papers found.