Dissecting Atomic Facts: Visual Analytics for Improving Fact Annotations in Language Model Evaluation

📅 2025-09-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current LLM factuality evaluation is hindered by ambiguous definitions of “atomic facts,” leading to substantial inter-annotator disagreement between human and model annotators. To address this, we propose the first visualization-based analytical framework specifically designed for diagnosing ambiguity in fact decomposition. Our method systematically identifies and supports iterative refinement of three core issues—semantic misalignment, granularity mismatch, and referential dependency—through interactive visualizations. It integrates state-of-the-art NLP-based fact decomposition techniques with interpretable visual design to localize inconsistencies and guide targeted revisions. Experimental results demonstrate that our framework significantly improves inter-annotator agreement (Cohen’s κ increases by 0.32), enhances the stability and reproducibility of factuality assessment, and establishes a collaborative, auditable infrastructure for developing high-fidelity evaluation benchmarks.

Technology Category

Application Category

📝 Abstract

Factuality evaluation of large language model (LLM) outputs requires decomposing text into discrete "atomic" facts. However, existing definitions of atomicity are underspecified, with empirical results showing high disagreement among annotators, both human and model-based, due to unresolved ambiguity in fact decomposition. We present a visual analytics concept to expose and analyze annotation inconsistencies in fact extraction. By visualizing semantic alignment, granularity and referential dependencies, our approach aims to enable systematic inspection of extracted facts and facilitate convergence through guided revision loops, establishing a more stable foundation for factuality evaluation benchmarks and improving LLM evaluation.

Problem

Research questions and friction points this paper is trying to address.

Addresses ambiguity in defining atomic facts for LLM evaluation

Resolves high annotator disagreement in fact decomposition

Improves consistency in factuality evaluation benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual analytics for fact annotation inconsistencies

Visualizing semantic alignment and granularity dependencies

Guided revision loops for stable evaluation benchmarks

🔎 Similar Papers

No similar papers found.

Authors to Follow