Dissecting Atomic Facts: Visual Analytics for Improving Fact Annotations in Language Model Evaluation

📅 2025-09-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current LLM factuality evaluation is hindered by ambiguous definitions of “atomic facts,” leading to substantial inter-annotator disagreement between human and model annotators. To address this, we propose the first visualization-based analytical framework specifically designed for diagnosing ambiguity in fact decomposition. Our method systematically identifies and supports iterative refinement of three core issues—semantic misalignment, granularity mismatch, and referential dependency—through interactive visualizations. It integrates state-of-the-art NLP-based fact decomposition techniques with interpretable visual design to localize inconsistencies and guide targeted revisions. Experimental results demonstrate that our framework significantly improves inter-annotator agreement (Cohen’s κ increases by 0.32), enhances the stability and reproducibility of factuality assessment, and establishes a collaborative, auditable infrastructure for developing high-fidelity evaluation benchmarks.

Technology Category

Application Category

📝 Abstract
Factuality evaluation of large language model (LLM) outputs requires decomposing text into discrete "atomic" facts. However, existing definitions of atomicity are underspecified, with empirical results showing high disagreement among annotators, both human and model-based, due to unresolved ambiguity in fact decomposition. We present a visual analytics concept to expose and analyze annotation inconsistencies in fact extraction. By visualizing semantic alignment, granularity and referential dependencies, our approach aims to enable systematic inspection of extracted facts and facilitate convergence through guided revision loops, establishing a more stable foundation for factuality evaluation benchmarks and improving LLM evaluation.
Problem

Research questions and friction points this paper is trying to address.

Addresses ambiguity in defining atomic facts for LLM evaluation
Resolves high annotator disagreement in fact decomposition
Improves consistency in factuality evaluation benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual analytics for fact annotation inconsistencies
Visualizing semantic alignment and granularity dependencies
Guided revision loops for stable evaluation benchmarks
🔎 Similar Papers
No similar papers found.
M
Manuel Schmidt
University of Konstanz
D
Daniel A. Keim
University of Konstanz
Frederik L. Dennig
Frederik L. Dennig
University of Konstanz
Visual AnalyticsInformation VisualizationHigh-Dimensional Data