PVminerLLM: Structured Extraction of Patient Voice from Patient-Generated Text using Large Language Models

📅 2026-03-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the challenge of leveraging unstructured patient-generated text, which contains rich socio-experiential information yet lacks standardized representation for patient-centered research and clinical improvement. To bridge this gap, the authors introduce PVminer—the first benchmark specifically designed for structured extraction of the patient voice—and propose a supervised fine-tuning approach using large language models to jointly extract codes, subcodes, and supporting evidence spans. Experimental results demonstrate that even with minimal fine-tuning data, the method substantially outperforms prompt-based baselines, achieving F1 scores of 83.82%, 80.74%, and 87.03% across multiple datasets, thereby confirming its efficacy and practical utility in structuring patient narratives for downstream applications.

Technology Category

Application Category

📝 Abstract

Motivation: Patient-generated text contains critical information about patients'lived experiences, social circumstances, and engagement in care, including factors that strongly influence adherence, care coordination, and health equity. However, these patient voice signals are rarely available in structured form, limiting their use in patient-centered outcomes research and clinical quality improvement. Reliable extraction of such information is therefore essential for understanding and addressing non-clinical drivers of health outcomes at scale. Results: We introduce PVminer, a benchmark for structured extraction of patient voice, and propose PVminerLLM, a supervised fine-tuned large language model tailored to this task. Across multiple datasets and model sizes, PVminerLLM substantially outperforms prompt-based baselines, achieving up to 83.82% F1 for Code prediction, 80.74% F1 for Sub-code prediction, and 87.03% F1 for evidence Span extraction. Notably, strong performance is achieved even with smaller models, demonstrating that reliable patient voice extraction is feasible without extreme model scale. These results enable scalable analysis of social and experiential signals embedded in patient-generated text. Availability and Implementation: Code, evaluation scripts, and trained LLMs will be released publicly. Annotated datasets will be made available upon request for research use. Keywords: Large Language Models, Supervised Fine-Tuning, Medical Annotation, Patient-Generated Text, Clinical NLP

Problem

Research questions and friction points this paper is trying to address.

Patient-Generated Text

Patient Voice

Structured Extraction

Clinical NLP

Health Equity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Supervised Fine-Tuning

Patient-Generated Text

Structured Extraction

Clinical NLP

🔎 Similar Papers

No similar papers found.

Authors to Follow