🤖 AI Summary
Automatic Text Simplification (ATS) suffers from weak correlations among readability metrics, automatic evaluation scores, and human judgments—a deficiency rooted in the lack of a precise conceptual definition of “simplification.” Method: This study conducts a systematic correlation analysis comparing traditional readability formulas, linguistic features, human ratings, and mainstream automatic metrics (e.g., BLEU, SARI) on English ATS tasks. Contribution/Results: Empirical results reveal consistently weak correlations between existing readability measures and both human judgments and automatic metrics, exposing a fundamental conceptual fragmentation in current ATS evaluation. The paper pioneers the reconceptualization of ATS around the anchor construct of “readability improvement,” advocating for a unified evaluation framework that jointly accounts for linguistic acceptability, semantic fidelity, and measurable readability gain. This work provides both theoretical reflection and methodological guidance for advancing ATS assessment paradigms.
📝 Abstract
Readability is a key concept in the current era of abundant written information. To help making texts more readable and make information more accessible to everyone, a line of researched aims at making texts accessible for their target audience: automatic text simplification (ATS). Lately, there have been studies on the correlations between automatic evaluation metrics in ATS and human judgment. However, the correlations between those two aspects and commonly available readability measures (such as readability formulas or linguistic features) have not been the focus of as much attention. In this work, we investigate the place of readability measures in ATS by complementing the existing studies on evaluation metrics and human judgment, on English. We first discuss the relationship between ATS and research in readability, then we report a study on correlations between readability measures and human judgment, and between readability measures and ATS evaluation metrics. We identify that in general, readability measures do not correlate well with automatic metrics and human judgment. We argue that as the three different angles from which simplification can be assessed tend to exhibit rather low correlations with one another, there is a need for a clear definition of the construct in ATS.