๐ค AI Summary
This study addresses the tendency of large language models (LLMs) to exhibit โargument collapseโ when generating public discourse, significantly undermining argumentative diversity. Through a systematic comparison of arguments produced by humans and LLMs in debates published by The New York Times and Boston Review, combined with large-scale text analysis, argument extraction, and structural pattern recognition, the work quantifies for the first time the systematic convergence of LLM-generated content at the levels of main claims, sub-claims, and discourse structure. Results reveal that the uniqueness of LLM-generated main claims is only 3.4%, drastically lower than the human baseline of 65.3%. Even when explicitly prompted for diversity, LLMs fail to adequately span the human argument space, producing overly generic sub-claims and highly rigid rhetorical structures.
๐ Abstract
As LLMs are increasingly used to draft public-facing arguments, they may flatten public debate by repeatedly introducing the same polished, plausible arguments. We study argument collapse, the tendency of essays generated by different LLMs to converge to a smaller set of main arguments, sub-arguments, and paragraph-level structures. We compare 1,039 human responses from 195 New York Times (NYT) debates, 448 human responses from 61 longer-form Boston Review (BR) forums, and 23,384 LLM-generated essays. In the NYT corpus, 65.3% of human main arguments are unique within a debate, compared to 3.4% of LLM main arguments. Asking LLMs to generate diverse answers adds variation, but a typical model recovers only about half of the distinct human main arguments, with much of the added variation falling outside the observed human argument space. Collapse also appears in sub-arguments, where among essays with the same main argument, 41.0% of human sub-arguments are unique versus 9.1% from LLM responses. Qualitatively, LLMs often reuse generalized and hedged sub-arguments, while humans prefer more concrete and topic-specific ones. Structure-wise, LLM-generated essays tend to follow a more fixed arc, often opening with a direct claim and moving quickly toward proposals. The same patterns hold in longer BR essays, suggesting that argument collapse extends beyond short-form responses.