LengClaro2023: A Dataset of Administrative Texts in Spanish with Plain Language adaptations

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

144K/year

🤖 AI Summary

The absence of evaluation datasets for Automatic Text Simplification (ATS) in Spanish legal-administrative texts hinders progress in this domain. Method: This study constructs the first bilingual, plain-language dataset specifically for Spanish social security administrative content, comprising high-frequency web documents. Each source text is manually rewritten into two versions: a baseline simplification adhering to Spain’s *arText claro* guidelines, and an enhanced version incorporating international plain language principles—including logical restructuring, terminological consistency, and syntactic simplification—while strictly preserving factual accuracy and domain-specific terminology. Contribution/Results: The publicly released dataset contains hundreds of high-quality source–simplification pairs. Its dual-gradient annotation schema enables fine-grained, multi-dimensional evaluation of ATS models along readability, accuracy, and practical utility—marking the first such framework for Spanish legal-administrative ATS. It further establishes a robust empirical benchmark for plain language policy development and validation.

Technology Category

Application Category

📝 Abstract

In this work, we present LengClaro2023, a dataset of legal-administrative texts in Spanish. Based on the most frequently used procedures from the Spanish Social Security website, we have created for each text two simplified equivalents. The first version follows the recommendations provided by arText claro. The second version incorporates additional recommendations from plain language guidelines to explore further potential improvements in the system. The linguistic resource created in this work can be used for evaluating automatic text simplification (ATS) systems in Spanish.

Problem

Research questions and friction points this paper is trying to address.

Create a Spanish legal-administrative text dataset

Generate simplified versions following plain language guidelines

Evaluate automatic text simplification systems in Spanish

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset of Spanish legal texts with simplifications

Two simplified versions per text using guidelines

Resource for evaluating text simplification systems

🔎 Similar Papers

No similar papers found.