🤖 AI Summary
This study addresses the early identification of respiratory distress through video analysis to facilitate timely clinical intervention. To this end, we propose a novel approach based on the Video Vision Transformer (ViViT), which performs a temporal ordering task on short video clips captured during the recovery phase following intense exercise in healthy volunteers, thereby capturing dynamic changes in respiratory state. Our method innovatively integrates Lie relative encoding (LieRE) and a motion-guided masking mechanism, combined with an embedding comparison strategy, to effectively model subtle respiratory mechanics. Experimental results demonstrate that the proposed approach achieves an F1 score of 0.81 in respiratory distress detection, validating the potential and efficacy of Video Transformers for non-contact monitoring of respiratory status.
📝 Abstract
Recognition of respiratory distress through visual inspection is a life saving clinical skill. Clinicians can detect early signs of respiratory deterioration, creating a valuable window for earlier intervention. In this study, we evaluate whether recent advances in video transformers can enable Artificial Intelligence systems to recognize the signs of respiratory distress from video. We collected videos of healthy volunteers recovering after strenuous exercise and used the natural recovery of each participants respiratory status to create a labeled dataset for respiratory distress. Splitting the video into short clips, with earlier clips corresponding to more shortness of breath, we designed a temporal ordering challenge to assess whether an AI system can detect respiratory distress. We found a ViViT encoder augmented with Lie Relative Encodings (LieRE) and Motion Guided Masking, combined with an embedding based comparison strategy, can achieve an F1 score of 0.81 on this task. Our findings suggest that modern video transformers can recognize subtle changes in respiratory mechanics.