🤖 AI Summary
This study addresses safety risks arising from medical information distortion in consumer health question (CHQ) summarization. To enhance factual fidelity and cross-lingual reliability, we propose a hybrid extractive-generative medical text summarization framework. Methodologically, it integrates TextRank-based sentence extraction, domain-specific medical named entity recognition (NER), and a fine-tuned LLaMA-2-7B model optimized for clinical accuracy. We evaluate the framework on the English MeQSum and Bengali BanglaCHQ-Summ datasets using automated metrics—including ROUGE, BERTScore, SummaC, and AlignScore—as well as human evaluation of critical information retention. Results demonstrate substantial improvements over zero-shot baselines and existing systems: over 80% of generated summaries are judged by clinicians to fully preserve essential medical facts. The framework thus advances the robustness, clinical safety, and cross-lingual applicability of CHQ summarization, mitigating misinformation risks in multilingual health communication.
📝 Abstract
Summarizing consumer health questions (CHQs) can ease communication in healthcare, but unfaithful summaries that misrepresent medical details pose serious risks. We propose a framework that combines TextRank-based sentence extraction and medical named entity recognition with large language models (LLMs) to enhance faithfulness in medical text summarization. In our experiments, we fine-tuned the LLaMA-2-7B model on the MeQSum (English) and BanglaCHQ-Summ (Bangla) datasets, achieving consistent improvements across quality (ROUGE, BERTScore, readability) and faithfulness (SummaC, AlignScore) metrics, and outperforming zero-shot baselines and prior systems. Human evaluation further shows that over 80% of generated summaries preserve critical medical information. These results highlight faithfulness as an essential dimension for reliable medical summarization and demonstrate the potential of our approach for safer deployment of LLMs in healthcare contexts.