🤖 AI Summary
Existing respiratory sound event detection methods suffer from three key limitations: (1) reliance on frame-level predictions followed by post-processing, hindering precise boundary modeling; (2) adoption of fixed-length input segments, limiting adaptability to variable-length clinical recordings; and (3) neglect of positional information within the respiratory cycle, which is critical for distinguishing pathological patterns. To address these, we propose an end-to-end framework integrating Graph Neural Networks (GNNs) with anchor-based temporal intervals. The GNN captures fine-grained inter-frame temporal dependencies; learnable anchors directly regress event onset and offset boundaries; and respiratory-phase positional features are explicitly encoded to enhance anomaly discrimination. The architecture natively supports variable-length inputs, improving clinical deployability. Evaluated on SPRSound 2024 and HF Lung V1, our method achieves significant improvements in event detection F1-score and temporal localization accuracy (mAP +8.3%), validating the effectiveness of explicit boundary regression and position-aware modeling.
📝 Abstract
Auscultation is a key method for early diagnosis of respiratory and pulmonary diseases, relying on skilled healthcare professionals. However, the process is often subjective, with variability between experts. As a result, numerous deep learning-based automatic classification methods have emerged, most of which focus on respiratory sound classification. In contrast, research on respiratory sound event detection remains limited. Existing sound event detection methods typically rely on frame-level predictions followed by post-processing to generate event-level outputs, making interval boundaries challenging to learn directly. Furthermore, many approaches can only handle fixed-length audio, lim- iting their applicability to variable-length respiratory sounds. Additionally, the impact of respiratory sound location information on detection performance has not been extensively explored. To address these issues, we propose a graph neural network-based framework with anchor intervals, capable of handling variable-length audio and providing more precise temporal localization for abnormal respi- ratory sound events. Our method improves both the flexibility and applicability of respiratory sound detection. Experiments on the SPRSound 2024 and HF Lung V1 datasets demonstrate the effec- tiveness of the proposed approach, and incorporating respiratory position information enhances the discrimination between abnormal sounds.