On Text Simplification Metrics and General-Purpose LLMs for Accessible Health Information, and A Potential Architectural Advantage of The Instruction-Tuned LLM class

📅 2025-11-07

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This study addresses the core challenge in large-scale biomedical text simplification—balancing readability and semantic fidelity—to improve public access to health information. We propose a large language model (LLM)-driven simplification framework based on instruction tuning. For the first time, we systematically validate the architectural advantages of Mistral-24B for this task: it significantly outperforms Qwen2.5-32B across 21 quantitative metrics—including SARI (42.46 vs. lower baseline) and BERTScore (0.91 vs. 0.89)—spanning readability, faithfulness, and safety, achieving near-human-level semantic consistency. Our work establishes a reproducible technical pipeline and a comprehensive evaluation benchmark for high-fidelity,科普-oriented biomedical text generation.

Technology Category

Application Category

📝 Abstract

The increasing health-seeking behavior and digital consumption of biomedical information by the general public necessitate scalable solutions for automatically adapting complex scientific and technical documents into plain language. Automatic text simplification solutions, including advanced large language models, however, continue to face challenges in reliably arbitrating the tension between optimizing readability performance and ensuring preservation of discourse fidelity. This report empirically assesses the performance of two major classes of general-purpose LLMs, demonstrating their linguistic capabilities and foundational readiness for the task compared to a human benchmark. Using a comparative analysis of the instruction-tuned Mistral 24B and the reasoning-augmented QWen2.5 32B, we identify a potential architectural advantage in the instruction-tuned LLM. Mistral exhibits a tempered lexical simplification strategy that enhances readability across a suite of metrics and the simplification-specific formula SARI (mean 42.46), while preserving human-level discourse with a BERTScore of 0.91. QWen also attains enhanced readability performance, but its operational strategy shows a disconnect in balancing between readability and accuracy, reaching a statistically significantly lower BERTScore of 0.89. Additionally, a comprehensive correlation analysis of 21 metrics spanning readability, discourse fidelity, content safety, and underlying distributional measures for mechanistic insights, confirms strong functional redundancies among five readability indices. This empirical evidence tracks baseline performance of the evolving LLMs for the task of text simplification, identifies the instruction-tuned Mistral 24B for simplification, provides necessary heuristics for metric selection, and points to lexical support as a primary domain-adaptation issue for simplification.

Problem

Research questions and friction points this paper is trying to address.

Automatically simplifying complex biomedical documents into plain language

Balancing readability optimization with discourse fidelity preservation in text simplification

Evaluating architectural advantages of instruction-tuned LLMs for accessible health information

Innovation

Methods, ideas, or system contributions that make the work stand out.

Instruction-tuned LLMs enhance readability and preserve discourse

Comparative analysis identifies architectural advantage in Mistral 24B

Lexical support is key domain-adaptation issue for simplification

🔎 Similar Papers

No similar papers found.