🤖 AI Summary
This work addresses the challenge of precise separation of mixed cardiorespiratory sounds for clinical auscultation diagnosis. We propose the first LLM-NMF collaborative framework, integrating large language models (LLMs) with non-negative matrix factorization (NMF). The LLM injects disease-specific semantic priors and dynamically adjusts the fundamental-frequency penalty term in the NMF loss function via closed-loop feedback—jointly optimizing source separation and medical reasoning. Evaluated on synthetic data and 210 clinically realistic cardiorespiratory recordings, our method achieves significant improvements over state-of-the-art approaches in both separation quality (e.g., SI-SNR, SDR) and downstream disease classification accuracy. The key contribution is the first incorporation of an LLM into the NMF optimization loop, establishing a “separation–reasoning–feedback” closed loop that advances medical audio analysis toward interpretability and adaptability.
📝 Abstract
This study represents the first integration of large language models (LLMs) with non-negative matrix factorization (NMF), marking a novel advancement in the source separation field. The LLM is employed in two unique ways: enhancing the separation results by providing detailed insights for disease prediction and operating in a feedback loop to optimize a fundamental frequency penalty added to the NMF cost function. We tested the algorithm on two datasets: 100 synthesized mixtures of real measurements, and 210 recordings of heart and lung sounds from a clinical manikin including both individual and mixed sounds, captured using a digital stethoscope. The approach consistently outperformed existing methods, demonstrating its potential to significantly enhance medical sound analysis for disease diagnostics.