Towards Multidisciplinary Summarization of Hospital Stays: Efficient Sentence-Level Clinical Provenance Categorization

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

168K/year
🤖 AI Summary
This study addresses the challenge of incoherent summaries arising from direct aggregation of multidisciplinary free-text notes in high-complexity clinical settings such as the NICU, where accurate sentence-level source classification is essential prior to summarization. The authors propose a supervised fine-tuning (SFT)-based clinical source classification pipeline leveraging Llama-3 large language models (8B and 70B parameters) to structure cross-departmental narratives using the MedSecId and NICU gold-standard datasets. Their findings highlight the critical role of model capacity in cross-domain generalization: a quantized 70B model fine-tuned via SFT achieves a 7% absolute improvement in Macro F1 on cross-domain NICU evaluation over full-precision baselines, while substantially reducing computational overhead. On MIMIC-III, the approach attains a Macro F1 exceeding 92%.
📝 Abstract
Effective "all-team" summarization in high-complexity settings like the Neonatal Intensive Care Unit (NICU) requires aggregating insights from diverse disciplines (physicians, nurses, therapists) spread across hundreds of clinical free-text notes. Simply pooling heterogeneous text often leads to incoherent outputs. Structured summarization therefore first requires accurate categorization of sentence-level provenance across multi-source notes. This pilot study introduces a clinical provenance categorization pipeline using supervised fine-tuning (SFT) of large language models (LLMs). We adapted two Llama-3 models (8B and 70B) to MedSecId, a corpus of 2,002 MIMIC-III (Adult ICU) notes annotated with clinical provenance headers, achieving in-domain Macro F1 scores above 92% for both models. To evaluate cross-domain generalization, we assessed model capacity (8B vs. 70B) and quantization on a gold-standard dataset of 227 sentence-level spans derived from three multi-disciplinary NICU summaries. Experimental results demonstrate a scale-dependent transfer effect: while SFT produced only marginal changes for the 8B model, it substantially improved the 70B model, increasing Macro F1 by 7%. Notably, the quantized fine-tuned 70B model outperformed its full-precision baseline while substantially reducing computational requirements. These findings suggest that sufficient model capacity is critical for preserving semantic flexibility during cross-domain clinical transfer and that efficient quantized adaptation can enable structured provenance modeling for downstream summarization.
Problem

Research questions and friction points this paper is trying to address.

clinical provenance
multidisciplinary summarization
sentence-level categorization
NICU
free-text notes
Innovation

Methods, ideas, or system contributions that make the work stand out.

clinical provenance categorization
supervised fine-tuning
large language models
cross-domain generalization
model quantization
🔎 Similar Papers
2024-05-27International Conference on Information and Knowledge ManagementCitations: 4
B
Baris Karacan
University of Illinois Chicago
V
Vaibhav Bhargava
University of Illinois Chicago
Barbara Di Eugenio
Barbara Di Eugenio
Professor, University of Illinois Chicago
Natural Language ProcessingHuman Computer InteractionEducational TechnologyNLP for healthcare
Natalie Parde
Natalie Parde
University of Illinois Chicago
Natural Language ProcessingArtificial Intelligence
C
Catherine K. Craven
University of Missouri-Columbia
K
Karen Dunn Lopez
University of Iowa
A
Andrew D. Boyd
University of Illinois Chicago
M
Mary Khetani
Y
Yu-Shan Tseng
V
Vanessa Barbosa
J
Julie Vignato
L
Lindsey Knake
R
Rajashree Dahal
E
Emily Spellman
D
Danielle Hitzel
J
Janine Petitgout
K
Kristi Haughey
A
Amanda Karstens
B
Brianna Clarahan
R
Rachel Dawson
L
Lauren Boyd
M
Mackenzie Weis
A
Angie Tipton
J
Jaewon Bae