On Text Simplification Metrics and General-Purpose LLMs for Accessible Health Information, and A Potential Architectural Advantage of The Instruction-Tuned LLM class

📅 2025-11-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the core challenge in large-scale biomedical text simplification—balancing readability and semantic fidelity—to improve public access to health information. We propose a large language model (LLM)-driven simplification framework based on instruction tuning. For the first time, we systematically validate the architectural advantages of Mistral-24B for this task: it significantly outperforms Qwen2.5-32B across 21 quantitative metrics—including SARI (42.46 vs. lower baseline) and BERTScore (0.91 vs. 0.89)—spanning readability, faithfulness, and safety, achieving near-human-level semantic consistency. Our work establishes a reproducible technical pipeline and a comprehensive evaluation benchmark for high-fidelity,科普-oriented biomedical text generation.

Technology Category

Application Category

📝 Abstract
The increasing health-seeking behavior and digital consumption of biomedical information by the general public necessitate scalable solutions for automatically adapting complex scientific and technical documents into plain language. Automatic text simplification solutions, including advanced large language models, however, continue to face challenges in reliably arbitrating the tension between optimizing readability performance and ensuring preservation of discourse fidelity. This report empirically assesses the performance of two major classes of general-purpose LLMs, demonstrating their linguistic capabilities and foundational readiness for the task compared to a human benchmark. Using a comparative analysis of the instruction-tuned Mistral 24B and the reasoning-augmented QWen2.5 32B, we identify a potential architectural advantage in the instruction-tuned LLM. Mistral exhibits a tempered lexical simplification strategy that enhances readability across a suite of metrics and the simplification-specific formula SARI (mean 42.46), while preserving human-level discourse with a BERTScore of 0.91. QWen also attains enhanced readability performance, but its operational strategy shows a disconnect in balancing between readability and accuracy, reaching a statistically significantly lower BERTScore of 0.89. Additionally, a comprehensive correlation analysis of 21 metrics spanning readability, discourse fidelity, content safety, and underlying distributional measures for mechanistic insights, confirms strong functional redundancies among five readability indices. This empirical evidence tracks baseline performance of the evolving LLMs for the task of text simplification, identifies the instruction-tuned Mistral 24B for simplification, provides necessary heuristics for metric selection, and points to lexical support as a primary domain-adaptation issue for simplification.
Problem

Research questions and friction points this paper is trying to address.

Automatically simplifying complex biomedical documents into plain language
Balancing readability optimization with discourse fidelity preservation in text simplification
Evaluating architectural advantages of instruction-tuned LLMs for accessible health information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Instruction-tuned LLMs enhance readability and preserve discourse
Comparative analysis identifies architectural advantage in Mistral 24B
Lexical support is key domain-adaptation issue for simplification
🔎 Similar Papers
No similar papers found.
P
P. Bilha Githinji
Tsinghua University, Tsinghua-Berkeley Shenzhen Institute, Shenzen, China
A
Aikaterini Melliou
Tsinghua University, Tsinghua-Berkeley Shenzhen Institute, Shenzen, China
Peiwu Qin
Peiwu Qin
Tsinghua Shenzhen International Graduate School
Image ProcessingTCM