Comparing LLM-generated and human-authored news text using formal syntactic theory

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates systematic syntactic differences between news texts generated by large language models (LLMs) in *The New York Times* style and those authored by humans. Method: For the first time, Head-driven Phrase Structure Grammar (HPSG)—a formal, constraint-based syntactic theory—is applied to large-scale, automated comparative syntactic analysis of LLM-generated versus human-written texts. We develop an automated parsing pipeline that constructs HPSG-based syntactic trees and computes type-level distributional statistics across six state-of-the-art LLMs. Contribution/Results: We identify statistically significant deviations in core syntactic category distributions, particularly in complex clausal embedding, left-peripheral constraints, and phrasal projection consistency—revealing quantifiable, systematic syntactic biases in LLM outputs. This work establishes the first formal-grammar-based, interpretable, and reproducible benchmark for evaluating LLM syntactic competence, providing both theoretical grounding and empirical evidence for diagnosing and improving grammatical modeling deficiencies in language models.

Technology Category

Application Category

📝 Abstract
This study provides the first comprehensive comparison of New York Times-style text generated by six large language models against real, human-authored NYT writing. The comparison is based on a formal syntactic theory. We use Head-driven Phrase Structure Grammar (HPSG) to analyze the grammatical structure of the texts. We then investigate and illustrate the differences in the distributions of HPSG grammar types, revealing systematic distinctions between human and LLM-generated writing. These findings contribute to a deeper understanding of the syntactic behavior of LLMs as well as humans, within the NYT genre.
Problem

Research questions and friction points this paper is trying to address.

Compare LLM-generated and human NYT text syntactically
Analyze grammatical differences using HPSG theory
Reveal systematic distinctions in syntax distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compare LLM and human text using syntactic theory
Analyze grammar with Head-driven Phrase Structure Grammar
Reveal systematic differences in grammar distributions
🔎 Similar Papers
No similar papers found.