paper.json: A Coordination Convention for LLM-Agent-Actionable Papers

📅 2026-05-15

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

This work addresses key limitations of large language model (LLM) agents in processing academic papers—namely, the absence of fine-grained citations, overgeneralization of claims, and opaque figure-generation instructions. To enhance the operability of scholarly documents, the authors propose paper.json, a lightweight companion file that encodes structured metadata. The approach introduces five novel conventions (C1–C5): stable claim identifiers, explicit lists of unsupported assertions, precise per-figure generation commands, a minimal viable compliance mechanism, and persistent definition identifiers. Designed in JSON with a declarative schema, the format supports automated validation scripts and cross-verification against typesetting source files (e.g., Typst). Authors can manually author a paper.json file within approximately one hour, substantially improving the accuracy and reliability of LLM-based analysis of academic content.

📝 Abstract

LLM agents routinely serve as first (and sometimes only) readers of academic papers, skimming for sub-claims, extracting reproducibility steps, and generalizing scope. Standard prose papers produce recurring failures in this role: sub-claims that cannot be cited at sub-paper granularity, scope overextension beyond what the paper tests, and figure commands buried in codebases rather than the paper itself. We propose `paper.json`, a companion JSON file that travels with the PDF and addresses each failure with a lightweight convention: stable claim IDs (C1), an explicit does-not-claim list (C2), exact per-figure shell commands (C3), and stable definition IDs (C5). A fifth convention (C4) holds that minimum viable compliance, hand-written JSON alongside the PDF, is achievable in under an hour for a finished paper without touching the human-readable output. C1, C2, C3, and C5 are open invitations: an agent that reads a compliant paper and acts on it produces evidence for or against them. This paper is itself compliant: `uv run validator.py paper.json --against paper.typ` passes. Repo: https://github.com/arquicanedo/paper-json

Problem

Research questions and friction points this paper is trying to address.

LLM agents

academic papers

reproducibility

claim granularity

scope overextension

Innovation

Methods, ideas, or system contributions that make the work stand out.

paper.json

LLM agents

structured metadata