Cough activity detection for automatic tuberculosis screening

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of automatically localizing the onset and offset of cough sounds in large-scale tuberculosis (TB) screening by proposing an end-to-end cough detection method based on a pretrained audio Transformer. Leveraging only the first three layers of the XLS-R model, the approach achieves efficient cough segmentation on real-world patient audio collected from communities in South Africa and Uganda, significantly reducing computational and memory requirements to enable deployment on smartphones. Experimental results demonstrate that XLS-R attains an average precision of 0.96 and a ROC-AUC of 0.99 on the test set, outperforming AST and logistic regression baselines by 9% and 27%, respectively. Moreover, a TB classifier trained on cough segments automatically extracted by this method achieves performance comparable to that using manually annotated segments, providing the first validation of end-to-end automatic cough segmentation for real-world TB screening.

Technology Category

Application Category

📝 Abstract
The automatic identification of cough segments in audio through the determination of start and end points is pivotal to building scalable screening tools in health technologies for pulmonary related diseases. We propose the application of two current pre-trained architectures to the task of cough activity detection. A dataset of recordings containing cough from patients symptomatic for tuberculosis (TB) who self-present at community-level care centres in South Africa and Uganda is employed. When automatic start and end points are determined using XLS-R, an average precision of 0.96 and an area under the receiver-operating characteristic of 0.99 are achieved for the test set. We show that best average precision is achieved by utilising only the first three layers of the network, which has the dual benefits of reduced computational and memory requirements, pivotal for smartphone-based applications. This XLS-R configuration is shown to outperform an audio spectrogram transformer (AST) as well as a logistic regression baseline by 9% and 27% absolute in test set average precision respectively. Furthermore, a downstream TB classification model trained using the coughs automatically isolated by XLS-R comfortably outperforms a model trained on the coughs isolated by AST, and is only narrowly outperformed by a classifier trained on the ground truth coughs. We conclude that the application of large pre-trained transformer models is an effective approach to identifying cough end-points and that the integration of such a model into a screening tool is feasible.
Problem

Research questions and friction points this paper is trying to address.

cough detection
tuberculosis screening
audio segmentation
end-point detection
health technology
Innovation

Methods, ideas, or system contributions that make the work stand out.

cough activity detection
XLS-R
tuberculosis screening
pre-trained transformer
smartphone-based health technology
J
Joshua Jansen van Vüren
Department of Electrical and Electronic Engineering, Stellenbosch University, Stellenbosch, South Africa
D
Devendra Singh Parihar
Department of Electrical and Electronic Engineering, Stellenbosch University, Stellenbosch, South Africa
D
Daphne Naidoo
DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
K
Kimsey Zajac
Faculty of Business and Economics, University of Göttingen, Göttingen, Germany
W
Willy Ssengooba
Department of Medical Microbiology, College of Health Sciences, Makerere University, Biomedical Research Center (MAKBRC), Kampala, Uganda
G
Grant Theron
DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
Thomas Niesler
Thomas Niesler
Department of Electrical and Electronic Engineering, University of Stellenbosch, South Africa
Speech recognitionHuman language technologyPattern Recognition