CLEAR: A Comprehensive Linguistic Evaluation of Argument Rewriting by Large Language Models

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates large language models’ (LLMs) text rewriting behavior in argument improvement (ArgImp). To address the fragmentation of existing evaluations, we propose CLEAR—a novel, interpretable, multi-dimensional automated evaluation framework for argumentative texts, incorporating 57 linguistically grounded metrics across lexical, syntactic, semantic, and pragmatic levels. Empirical analysis across multiple argumentation corpora reveals that LLM-based rewriting systematically shortens text length, increases lexical complexity (e.g., word length), and enhances syntactic integration—thereby improving persuasiveness and coherence. This work is the first to uncover cross-level linguistic transformation patterns induced by LLMs in argument rewriting, establishing a theoretical foundation and methodological toolkit for controllable argument generation and interpretable evaluation of argumentative language models.

Technology Category

Application Category

📝 Abstract

While LLMs have been extensively studied on general text generation tasks, there is less research on text rewriting, a task related to general text generation, and particularly on the behavior of models on this task. In this paper we analyze what changes LLMs make in a text rewriting setting. We focus specifically on argumentative texts and their improvement, a task named Argument Improvement (ArgImp). We present CLEAR: an evaluation pipeline consisting of 57 metrics mapped to four linguistic levels: lexical, syntactic, semantic and pragmatic. This pipeline is used to examine the qualities of LLM-rewritten arguments on a broad set of argumentation corpora and compare the behavior of different LLMs on this task and analyze the behavior of different LLMs on this task in terms of linguistic levels. By taking all four linguistic levels into consideration, we find that the models perform ArgImp by shortening the texts while simultaneously increasing average word length and merging sentences. Overall we note an increase in the persuasion and coherence dimensions.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' argument rewriting behavior

Analyzing linguistic changes across four levels

Assessing persuasion and coherence improvements

Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive evaluation pipeline with 57 metrics

Analyzes four linguistic levels simultaneously

Compares multiple LLMs on argument improvement

🔎 Similar Papers

No similar papers found.

Authors to Follow