German4All - A Dataset and Model for Readability-Controlled Paraphrasing in German

📅 2025-08-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses controllable readability paraphrasing for German text, targeting readers across multiple proficiency levels. Methodologically, we introduce the first large-scale, five-level aligned German controllable readability rewriting dataset (25,000 samples), synthesized via GPT-4 and rigorously validated through human annotation and LLM-based quality assessment. Leveraging this dataset, we train a deep learning model enabling fine-grained control over discrete readability levels. Our key contributions are: (1) releasing the first open-source, multi-level German readability rewriting benchmark; (2) open-sourcing a high-performance controllable paraphrasing model; and (3) achieving state-of-the-art performance on German text simplification. The proposed approach significantly enhances textual accessibility for diverse audiences—including children, second-language learners, and individuals with cognitive impairments—by enabling precise, level-aware simplification.

Technology Category

Application Category

📝 Abstract
The ability to paraphrase texts across different complexity levels is essential for creating accessible texts that can be tailored toward diverse reader groups. Thus, we introduce German4All, the first large-scale German dataset of aligned readability-controlled, paragraph-level paraphrases. It spans five readability levels and comprises over 25,000 samples. The dataset is automatically synthesized using GPT-4 and rigorously evaluated through both human and LLM-based judgments. Using German4All, we train an open-source, readability-controlled paraphrasing model that achieves state-of-the-art performance in German text simplification, enabling more nuanced and reader-specific adaptations. We opensource both the dataset and the model to encourage further research on multi-level paraphrasing
Problem

Research questions and friction points this paper is trying to address.

Creating readability-controlled paraphrases for German texts
Addressing lack of large-scale German paraphrasing datasets
Enabling tailored text adaptations for diverse reader groups
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPT-4 synthesized German readability dataset
Open-source readability-controlled paraphrasing model
Multi-level text simplification for diverse readers
🔎 Similar Papers
No similar papers found.
Miriam Anschütz
Miriam Anschütz
PhD Student of Computer Science, Technical University of Munich
Natural language processingeasy-to-readtext simplification
T
Thanh Mai Pham
Technical University of Munich
E
Eslam Nasrallah
Technical University of Munich
M
Maximilian Müller
Technical University of Munich
C
Cristian-George Craciun
Technical University of Munich
Georg Groh
Georg Groh
Adjunct Professor
Social ComputingNatural Language Processing