Correct-Detect: Balancing Performance and Ambiguity Through the Lens of Coreference Resolution in LLMs

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This paper identifies an inherent performance trade-off in large language models (LLMs) between coreference resolution and ambiguity detection: while LLMs can be individually optimized for either task, they struggle to excel at both simultaneously. To formalize this tension, we introduce the “CORRECT-DETECT trade-off”—the first systematic characterization of the intrinsic conflict between referential accuracy and ambiguity awareness in LLMs. Using a minimal-prompt paradigm, we evaluate both tasks jointly within a unified framework and conduct empirical analysis on human-annotated data. Experiments across mainstream LLMs reveal a significant negative correlation between coreference resolution and ambiguity detection performance, with no model achieving Pareto-optimality on both. This finding exposes a fundamental limitation in LLMs’ deep semantic modeling of referential structure and provides critical theoretical insight and empirical evidence for designing robust coreference resolution systems.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are intended to reflect human linguistic competencies. But humans have access to a broad and embodied context, which is key in detecting and resolving linguistic ambiguities, even in isolated text spans. A foundational case of semantic ambiguity is found in the task of coreference resolution: how is a pronoun related to an earlier person mention? This capability is implicit in nearly every downstream task, and the presence of ambiguity at this level can alter performance significantly. We show that LLMs can achieve good performance with minimal prompting in both coreference disambiguation and the detection of ambiguity in coreference, however, they cannot do both at the same time. We present the CORRECT-DETECT trade-off: though models have both capabilities and deploy them implicitly, successful performance balancing these two abilities remains elusive.

Problem

Research questions and friction points this paper is trying to address.

LLMs struggle with coreference resolution ambiguity

Models cannot simultaneously detect and resolve ambiguities

Balancing disambiguation and ambiguity detection remains challenging

Innovation

Methods, ideas, or system contributions that make the work stand out.

Coreference resolution balancing performance and ambiguity

LLMs detect ambiguity with minimal prompting

CORRECT-DETECT trade-off manages dual capabilities implicitly

🔎 Similar Papers

ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence