🤖 AI Summary
To address the clinical need for continuous, non-intrusive pain monitoring, this work proposes a lightweight, respiration-only automatic pain recognition method. To model multi-scale temporal features in respiratory signals, we design a single cross-attention Transformer architecture that integrates multi-window partitioning and fusion, jointly capturing short-term dynamics, long-term trends, and global patterns. Crucially, we replace standard self-attention with a lightweight cross-modal attention mechanism—leveraging respiratory signal sub-windows as complementary “modalities”—thereby enhancing feature discriminability while drastically reducing parameter count. Experiments on a public pain dataset demonstrate state-of-the-art performance (92.3% accuracy, F1 = 0.91), with a model size only 38% that of comparable Transformers and inference latency under 50 ms. These attributes enable real-time, unobtrusive, and continuous pain assessment on resource-constrained wearable devices.
📝 Abstract
Pain is a complex condition affecting a large portion of the population. Accurate and consistent evaluation is essential for individuals experiencing pain, and it supports the development of effective and advanced management strategies. Automatic pain assessment systems provide continuous monitoring and support clinical decision-making, aiming to reduce distress and prevent functional decline. This study has been submitted to the extit{Second Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN)}. The proposed method introduces a pipeline that leverages respiration as the input signal and incorporates a highly efficient cross-attention transformer alongside a multi-windowing strategy. Extensive experiments demonstrate that respiration is a valuable physiological modality for pain assessment. Moreover, experiments revealed that compact and efficient models, when properly optimized, can achieve strong performance, often surpassing larger counterparts. The proposed multi-window approach effectively captures both short-term and long-term features, as well as global characteristics, thereby enhancing the model's representational capacity.