🤖 AI Summary
Public health literacy deficits hinder comprehension of professional medical information. Method: This study proposes a biomedical report simplification framework targeting common health queries from MedlinePlus, grounded in PubMed abstracts. We introduce PLABA—a fine-grained, sentence-level simplification benchmark—and conduct the first systematic evaluation of conversational fine-tuning strategies for open-source large language models (LLMs). Our framework integrates prompt engineering with multi-dimensional automated evaluation (conciseness, accuracy, readability). Results: GPT-4 achieves top average conciseness—significantly outperforming baselines—while ranking third in accuracy; the best open-source model, after conversational fine-tuning, approaches GPT-4’s performance. This work advances medical information equity and establishes a reproducible methodology and benchmark for AI-driven health communication simplification.
📝 Abstract
A vast amount of medical knowledge is available for public use through online health forums, and question-answering platforms on social media. The majority of the population in the United States doesn't have the right amount of health literacy to make the best use of that information. Health literacy means the ability to obtain and comprehend the basic health information to make appropriate health decisions. To build the bridge between this gap, organizations advocate adapting this medical knowledge into plain language. Building robust systems to automate the adaptations helps both medical and non-medical professionals best leverage the available information online. The goal of the Plain Language Adaptation of Biomedical Abstracts (PLABA) track is to adapt the biomedical abstracts in English language extracted from PubMed based on the questions asked in MedlinePlus for the general public using plain language at the sentence level. As part of this track, we leveraged the best open-source Large Language Models suitable and fine-tuned for dialog use cases. We compare and present the results for all of our systems and our ranking among the other participants' submissions. Our top performing GPT-4 based model ranked first in the avg. simplicity measure and 3rd on the avg. accuracy measure.