Argument Collapse: LLMs Flatten Long-Form Public Debate

๐Ÿ“… 2026-06-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

181K/year
๐Ÿค– AI Summary
This study addresses the tendency of large language models (LLMs) to exhibit โ€œargument collapseโ€ when generating public discourse, significantly undermining argumentative diversity. Through a systematic comparison of arguments produced by humans and LLMs in debates published by The New York Times and Boston Review, combined with large-scale text analysis, argument extraction, and structural pattern recognition, the work quantifies for the first time the systematic convergence of LLM-generated content at the levels of main claims, sub-claims, and discourse structure. Results reveal that the uniqueness of LLM-generated main claims is only 3.4%, drastically lower than the human baseline of 65.3%. Even when explicitly prompted for diversity, LLMs fail to adequately span the human argument space, producing overly generic sub-claims and highly rigid rhetorical structures.
๐Ÿ“ Abstract
As LLMs are increasingly used to draft public-facing arguments, they may flatten public debate by repeatedly introducing the same polished, plausible arguments. We study argument collapse, the tendency of essays generated by different LLMs to converge to a smaller set of main arguments, sub-arguments, and paragraph-level structures. We compare 1,039 human responses from 195 New York Times (NYT) debates, 448 human responses from 61 longer-form Boston Review (BR) forums, and 23,384 LLM-generated essays. In the NYT corpus, 65.3% of human main arguments are unique within a debate, compared to 3.4% of LLM main arguments. Asking LLMs to generate diverse answers adds variation, but a typical model recovers only about half of the distinct human main arguments, with much of the added variation falling outside the observed human argument space. Collapse also appears in sub-arguments, where among essays with the same main argument, 41.0% of human sub-arguments are unique versus 9.1% from LLM responses. Qualitatively, LLMs often reuse generalized and hedged sub-arguments, while humans prefer more concrete and topic-specific ones. Structure-wise, LLM-generated essays tend to follow a more fixed arc, often opening with a direct claim and moving quickly toward proposals. The same patterns hold in longer BR essays, suggesting that argument collapse extends beyond short-form responses.
Problem

Research questions and friction points this paper is trying to address.

argument collapse
large language models
public debate
argument diversity
LLM-generated essays
Innovation

Methods, ideas, or system contributions that make the work stand out.

argument collapse
large language models
public debate
argument diversity
text generation