๐ค AI Summary
This work addresses key limitations of large language model (LLM) agents in processing academic papersโnamely, the absence of fine-grained citations, overgeneralization of claims, and opaque figure-generation instructions. To enhance the operability of scholarly documents, the authors propose paper.json, a lightweight companion file that encodes structured metadata. The approach introduces five novel conventions (C1โC5): stable claim identifiers, explicit lists of unsupported assertions, precise per-figure generation commands, a minimal viable compliance mechanism, and persistent definition identifiers. Designed in JSON with a declarative schema, the format supports automated validation scripts and cross-verification against typesetting source files (e.g., Typst). Authors can manually author a paper.json file within approximately one hour, substantially improving the accuracy and reliability of LLM-based analysis of academic content.
๐ Abstract
LLM agents routinely serve as first (and sometimes only) readers of academic papers, skimming for sub-claims, extracting reproducibility steps, and generalizing scope. Standard prose papers produce recurring failures in this role: sub-claims that cannot be cited at sub-paper granularity, scope overextension beyond what the paper tests, and figure commands buried in codebases rather than the paper itself. We propose `paper.json`, a companion JSON file that travels with the PDF and addresses each failure with a lightweight convention: stable claim IDs (C1), an explicit does-not-claim list (C2), exact per-figure shell commands (C3), and stable definition IDs (C5). A fifth convention (C4) holds that minimum viable compliance, hand-written JSON alongside the PDF, is achievable in under an hour for a finished paper without touching the human-readable output. C1, C2, C3, and C5 are open invitations: an agent that reads a compliant paper and acts on it produces evidence for or against them. This paper is itself compliant: `uv run validator.py paper.json --against paper.typ` passes. Repo: https://github.com/arquicanedo/paper-json