Correct-Detect: Balancing Performance and Ambiguity Through the Lens of Coreference Resolution in LLMs

๐Ÿ“… 2025-09-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper identifies an inherent performance trade-off in large language models (LLMs) between coreference resolution and ambiguity detection: while LLMs can be individually optimized for either task, they struggle to excel at both simultaneously. To formalize this tension, we introduce the โ€œCORRECT-DETECT trade-offโ€โ€”the first systematic characterization of the intrinsic conflict between referential accuracy and ambiguity awareness in LLMs. Using a minimal-prompt paradigm, we evaluate both tasks jointly within a unified framework and conduct empirical analysis on human-annotated data. Experiments across mainstream LLMs reveal a significant negative correlation between coreference resolution and ambiguity detection performance, with no model achieving Pareto-optimality on both. This finding exposes a fundamental limitation in LLMsโ€™ deep semantic modeling of referential structure and provides critical theoretical insight and empirical evidence for designing robust coreference resolution systems.

Technology Category

Application Category

๐Ÿ“ Abstract
Large Language Models (LLMs) are intended to reflect human linguistic competencies. But humans have access to a broad and embodied context, which is key in detecting and resolving linguistic ambiguities, even in isolated text spans. A foundational case of semantic ambiguity is found in the task of coreference resolution: how is a pronoun related to an earlier person mention? This capability is implicit in nearly every downstream task, and the presence of ambiguity at this level can alter performance significantly. We show that LLMs can achieve good performance with minimal prompting in both coreference disambiguation and the detection of ambiguity in coreference, however, they cannot do both at the same time. We present the CORRECT-DETECT trade-off: though models have both capabilities and deploy them implicitly, successful performance balancing these two abilities remains elusive.
Problem

Research questions and friction points this paper is trying to address.

LLMs struggle with coreference resolution ambiguity
Models cannot simultaneously detect and resolve ambiguities
Balancing disambiguation and ambiguity detection remains challenging
Innovation

Methods, ideas, or system contributions that make the work stand out.

Coreference resolution balancing performance and ambiguity
LLMs detect ambiguity with minimal prompting
CORRECT-DETECT trade-off manages dual capabilities implicitly
๐Ÿ”Ž Similar Papers