🤖 AI Summary
The absence of evaluation datasets for Automatic Text Simplification (ATS) in Spanish legal-administrative texts hinders progress in this domain.
Method: This study constructs the first bilingual, plain-language dataset specifically for Spanish social security administrative content, comprising high-frequency web documents. Each source text is manually rewritten into two versions: a baseline simplification adhering to Spain’s *arText claro* guidelines, and an enhanced version incorporating international plain language principles—including logical restructuring, terminological consistency, and syntactic simplification—while strictly preserving factual accuracy and domain-specific terminology.
Contribution/Results: The publicly released dataset contains hundreds of high-quality source–simplification pairs. Its dual-gradient annotation schema enables fine-grained, multi-dimensional evaluation of ATS models along readability, accuracy, and practical utility—marking the first such framework for Spanish legal-administrative ATS. It further establishes a robust empirical benchmark for plain language policy development and validation.
📝 Abstract
In this work, we present LengClaro2023, a dataset of legal-administrative texts in Spanish. Based on the most frequently used procedures from the Spanish Social Security website, we have created for each text two simplified equivalents. The first version follows the recommendations provided by arText claro. The second version incorporates additional recommendations from plain language guidelines to explore further potential improvements in the system. The linguistic resource created in this work can be used for evaluating automatic text simplification (ATS) systems in Spanish.