🤖 AI Summary
This study addresses the challenge of incoherent summaries arising from direct aggregation of multidisciplinary free-text notes in high-complexity clinical settings such as the NICU, where accurate sentence-level source classification is essential prior to summarization. The authors propose a supervised fine-tuning (SFT)-based clinical source classification pipeline leveraging Llama-3 large language models (8B and 70B parameters) to structure cross-departmental narratives using the MedSecId and NICU gold-standard datasets. Their findings highlight the critical role of model capacity in cross-domain generalization: a quantized 70B model fine-tuned via SFT achieves a 7% absolute improvement in Macro F1 on cross-domain NICU evaluation over full-precision baselines, while substantially reducing computational overhead. On MIMIC-III, the approach attains a Macro F1 exceeding 92%.
📝 Abstract
Effective "all-team" summarization in high-complexity settings like the Neonatal Intensive Care Unit (NICU) requires aggregating insights from diverse disciplines (physicians, nurses, therapists) spread across hundreds of clinical free-text notes. Simply pooling heterogeneous text often leads to incoherent outputs. Structured summarization therefore first requires accurate categorization of sentence-level provenance across multi-source notes. This pilot study introduces a clinical provenance categorization pipeline using supervised fine-tuning (SFT) of large language models (LLMs). We adapted two Llama-3 models (8B and 70B) to MedSecId, a corpus of 2,002 MIMIC-III (Adult ICU) notes annotated with clinical provenance headers, achieving in-domain Macro F1 scores above 92% for both models. To evaluate cross-domain generalization, we assessed model capacity (8B vs. 70B) and quantization on a gold-standard dataset of 227 sentence-level spans derived from three multi-disciplinary NICU summaries. Experimental results demonstrate a scale-dependent transfer effect: while SFT produced only marginal changes for the 8B model, it substantially improved the 70B model, increasing Macro F1 by 7%. Notably, the quantized fine-tuned 70B model outperformed its full-precision baseline while substantially reducing computational requirements. These findings suggest that sufficient model capacity is critical for preserving semantic flexibility during cross-domain clinical transfer and that efficient quantized adaptation can enable structured provenance modeling for downstream summarization.