Assessing the Utility of Audio Foundation Models for Heart and Respiratory Sound Analysis

📅 2025-04-25

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

General-purpose audio foundation models (e.g., AST, PaSST, BEATs) lack systematic evaluation for clinical auscultation tasks such as heart and lung sound analysis. Method: We conduct the first cross-task benchmarking of these models on four clinically relevant tasks—heart sound classification, respiratory sound classification, abnormal breath detection, and crackle/wheeze identification—using linear probing and lightweight fine-tuning on public datasets. Performance is rigorously compared against state-of-the-art (SOTA) domain-specific models under zero-shot and fine-tuned settings. Contribution/Results: On high-quality data tasks, foundation models match or exceed SOTA accuracy—outperforming dedicated respiratory sound models by up to 8.2% absolute accuracy. However, they exhibit limited generalization under high-noise conditions. We release a standardized evaluation protocol, unified codebase, and reproducible benchmarks to advance rigorous, clinically grounded assessment of medical audio foundation models.

Technology Category

Application Category

📝 Abstract

Pre-trained deep learning models, known as foundation models, have become essential building blocks in machine learning domains such as natural language processing and image domains. This trend has extended to respiratory and heart sound models, which have demonstrated effectiveness as off-the-shelf feature extractors. However, their evaluation benchmarking has been limited, resulting in incompatibility with state-of-the-art (SOTA) performance, thus hindering proof of their effectiveness. This study investigates the practical effectiveness of off-the-shelf audio foundation models by comparing their performance across four respiratory and heart sound tasks with SOTA fine-tuning results. Experiments show that models struggled on two tasks with noisy data but achieved SOTA performance on the other tasks with clean data. Moreover, general-purpose audio models outperformed a respiratory sound model, highlighting their broader applicability. With gained insights and the released code, we contribute to future research on developing and leveraging foundation models for respiratory and heart sounds.

Problem

Research questions and friction points this paper is trying to address.

Evaluating audio foundation models for heart and respiratory sound analysis

Comparing model performance on noisy vs clean sound data tasks

Assessing general-purpose audio models versus specialized respiratory sound models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-trained audio foundation models for sound analysis

Comparison with SOTA fine-tuning on clean data

General-purpose models outperform specialized sound models

🔎 Similar Papers

No similar papers found.