Is the Top Still Spinning? Evaluating Subjectivity in Narrative Understanding

๐Ÿ“… 2025-04-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Factuality assessment in narrative understanding suffers from subjectivity, particularly when judging the fidelity of statements to source documents amid ambiguous boundaries. Method: This paper reframes binary faithfulness classification as a quantifiable ambiguity measurement, introducing the Ambiguity Rewriting Measure (ARM). ARM leverages large language models to generate controlled summary editing sequences, quantifying claim ambiguity via rewriting magnitude rather than binary labels. The approach integrates controlled summarization editing, narrative consistency modeling, and a subjective-quantification evaluation framework. Results: On narrative summarization tasks, ARM improves inter-annotator agreement by 21 percentage points, substantially mitigating unreliability in factuality evaluation caused by divergent subjective interpretations. It establishes the first generative rewritingโ€“based paradigm for ambiguity quantification in factuality assessment.

Technology Category

Application Category

๐Ÿ“ Abstract
Determining faithfulness of a claim to a source document is an important problem across many domains. This task is generally treated as a binary judgment of whether the claim is supported or unsupported in relation to the source. In many cases, though, whether a claim is supported can be ambiguous. For instance, it may depend on making inferences from given evidence, and different people can reasonably interpret the claim as either supported or unsupported based on their agreement with those inferences. Forcing binary labels upon such claims lowers the reliability of evaluation. In this work, we reframe the task to manage the subjectivity involved with factuality judgments of ambiguous claims. We introduce LLM-generated edits of summaries as a method of providing a nuanced evaluation of claims: how much does a summary need to be edited to be unambiguous? Whether a claim gets rewritten and how much it changes can be used as an automatic evaluation metric, the Ambiguity Rewrite Metric (ARM), with a much richer feedback signal than a binary judgment of faithfulness. We focus on the area of narrative summarization as it is particularly rife with ambiguity and subjective interpretation. We show that ARM produces a 21% absolute improvement in annotator agreement on claim faithfulness, indicating that subjectivity is reduced.
Problem

Research questions and friction points this paper is trying to address.

Evaluating subjectivity in narrative understanding
Managing ambiguity in factuality judgments of claims
Improving annotator agreement on claim faithfulness
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-generated edits for nuanced evaluation
Ambiguity Rewrite Metric (ARM) as feedback
Improved annotator agreement by 21%
๐Ÿ”Ž Similar Papers
No similar papers found.