Confidence Calibration in Large Language Model-Based Entity Matching

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the prevalent overconfidence problem in large language models—particularly RoBERTa—when applied to entity matching tasks. We systematically investigate confidence calibration techniques and propose a multi-strategy calibration framework integrating temperature scaling, Monte Carlo Dropout, and model ensembling. Empirical evaluation is conducted across multiple standard entity matching benchmarks. Results demonstrate that the baseline RoBERTa exhibits significant miscalibration; temperature scaling emerges as the most effective single-method intervention, reducing the Expected Calibration Error (ECE) by up to 23.83%; the full calibrated framework achieves a minimum ECE of 0.0043, substantially improving predictive reliability and decision trustworthiness. This study establishes a reproducible calibration paradigm and empirical benchmark for deploying LLMs reliably in high-stakes downstream applications.

Technology Category

Application Category

📝 Abstract
This research aims to explore the intersection of Large Language Models and confidence calibration in Entity Matching. To this end, we perform an empirical study to compare baseline RoBERTa confidences for an Entity Matching task against confidences that are calibrated using Temperature Scaling, Monte Carlo Dropout and Ensembles. We use the Abt-Buy, DBLP-ACM, iTunes-Amazon and Company datasets. The findings indicate that the proposed modified RoBERTa model exhibits a slight overconfidence, with Expected Calibration Error scores ranging from 0.0043 to 0.0552 across datasets. We find that this overconfidence can be mitigated using Temperature Scaling, reducing Expected Calibration Error scores by up to 23.83%.
Problem

Research questions and friction points this paper is trying to address.

Calibrating confidence scores for entity matching using large language models
Addressing overconfidence issues in RoBERTa model predictions for entity matching
Improving confidence calibration accuracy through temperature scaling techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Calibrating RoBERTa confidences with Temperature Scaling
Comparing Monte Carlo Dropout and Ensemble calibration methods
Reducing Expected Calibration Error by up to 23.83%
🔎 Similar Papers
No similar papers found.
I
Iris Kamsteeg
Bernoulli Institute, University of Groningen, The Netherlands
Juan Cardenas-Cartagena
Juan Cardenas-Cartagena
Lecturer in AI, University of Groningen
OptimizationControl SystemsReinforcement LearningCyber-physical Systems
F
Floris van Beers
Independent Researcher
G
Gineke ten Holt
Independent Researcher
T
Tsegaye Misikir Tashu
Bernoulli Institute, University of Groningen, The Netherlands
Matias Valdenegro-Toro
Matias Valdenegro-Toro
Assistant Professor of Machine Learning, Bernoulli Institute, University of Groningen
Uncertainty in Machine LearningBayesian Deep LearningRobot Perception